Last modified: 2014-05-08 22:05:14 UTC
As per [[m:IRC office hours/Office hours 2014-01-23]], replication from OSM to a database server accessible from Labs should be set up a) so that tools dependent on this data can move from the Toolserver to Wikimedia Labs/Tools and b) as a prerequisite for getting performance data, intra-WMF replication, a production/experimental tile server, etc. The estimated ETA "with a definite danger of being wrong" is "at least 3-4 weeks". Nosy offered to provide any information on the current Toolserver setup that is needed. NB: The final accessibility is likely dependent on bug #48896 as probably one authentication will be used for the general use and the OSM part of the database server. The core of this bug however is to set up the replication so that when access is provided, data will be there and being replicated.
Maybe I'm misunderstanding the "Importance" ratings, but this bug looks to me more like a "blocker" than a mere "enhancement". In particular it _blocks_ the migration of a whole bunch of tools from the toolserver to labs. And by now, as the deadline for the toolserver deactivation has crept nearer and nearer, I would say that the priority is beyond "normal". Almost a year has been wasted during which work for the migration of these tools could not be performed due to the lack of this basic component. But, hey, probably everyone thinks "their bug" is the most important...
Could we get some short hardware specs of the database server? *How large is the SSD? *How much RAM? *Processor?
It's a fairly hefty box; it has 32G of ram and 8 cores. No SSD for that one, it uses an array of spinning rust (about 9T worth, once in raid). Keep in mind though that the postgresql master is going to share resources with the mysql (tools-db replacement) slave, and vice versa. (So there are in fact two servers, not just one)
I'd suggest running the tile server on a extra box then because I am not sure if I/O will suffice otherwise.
@nosy. That is the plan anyway. This ticket is just about the OSM database. @Kolossos I see you are asking about SSDs. Why is that? Does the database server on toolserver have an SSD? I see in munin that ptolemy has somewhat less than 30 MB/s read I/O on average, with spikes around 90MB/s. I 'd say that the current boxes will survive that load, although I got curious on what causes those spikes. Any idea ? Also something important that I am not clear on, is that server the one hosting the postgres db as well ? Or is that a different machine ?
It's not I/O bandwidth what OSM depends on, it's latency that is important because rendering a tile involves a lot of DB requests.
openstreetmap has switched to ssd http://wiki.openstreetmap.org/wiki/Servers and it's been greatly beneficial, when it comes to importing planet and for updating the db: http://wiki.openstreetmap.org/wiki/Osm2pgsql/benchmarks For any new hardware purchases for osm stuff, ssd would be well worth it imho. :) non-ssd will work but not as great.
Ptolemy hosts both the db and does rendering, which is a fairly common setup, as the required resources of rendering and db are nicely complementary. The toolserver (ptolemy) however has always had a massive performance problem and we never managed to figure out quite why. Many of the database operations are something like a factor of 10 or more slower than on the tileserver of OSM or other tileservers. My guess (and as I never managed to get real proof for it) is that it is a combination of factors that cause the problems. 1) Part of it is likely the SSDs. From experience with the OSMF tileserver and from many other installations of the tileserver, it has shown that moving to SSDs from a HDD arrays is a huge boost for the type of workload done by the postgres db. Granted most people have moved over from perhaps 4 - 8 HDD arrays to an SSD. So if you have HDD arrays of e.g. 16 disks or more performance might be on paar to a single SSD. However, given the OSM postgres db is currently only about 400GB, a single consumer 512GB SSD is much cheaper than a 16 disk array. So value for money is simply better with SSDs for the OSM workload. Using consumer SSDs appears to be perfectly fine as well, despite a reasonably heavy constant write load. E.g. on the OSM tileserver there is a single 600GB Intel 320 SSD. I can't remember exactly when it was installed, but I think it might be coming up to 3 years now and the media wearout indicator shows that only 4% of its lifetime has been used. So at current rate, the consumer SSD would support 60 years of writes. 2) Part of it however I believe is also software driven and my guess would be the solaris kernel or filesystem has something to do with it (and this is more or less pure speculation as I have no way of validating this hypothesis). If you look at average read throughput in the munin graphs the values of ptolemy are actually roughly on par with those of osmf's tileservers. Sure latency has a lot to do with things, but I would expect that to still manifest itself in average read rates if the processes were constantly waiting for disk access. So my guess, given the disk read rates are roughly comparable, but the resultant database performance is something like an order of magnitude worse, that either the disk cache in RAM is extremely inefficient on ptolemy, or that there is things read-ahead going on, and a lot of unnecessary data is read from disk. 3) Toolserver runs a number of stylesheets some of which I believe are much less efficient than the main osm style, adding extra load on the db. However, as the main style is much slower and also single SQL queries run directly in postgres show the huge slowdown, the performance problems don't just stem from overload of the system. Overall, despite the huge performance problems and the extra load due to more styles, ptolemy can just about manage to do what is needed. Probably, because despite being included in a number of desktop Wikipedias (e.g. de, ru via the osm-gadget and en via the WMA), the load they put on the servers is surprisingly tiny. Reducing the update frequency, and a huge problem with sockets in solaris and perl, which rejects something like 80% of rendering requests due to failed socket initialisation, has also helped keep the workload just about manageable on ptolemy. But it would be nice to do better. Finally, regarding the 90Mb/s spikes. I can't say for sure what the cause is, but rendering low zoom tiles (i.e. huge geographical areas) is much more demanding on pulling in data, than if you render high zoom tiles, where the spatial indexes make sure one only needs to read a tiny fraction of the database. So Z0 - Z7 likely do seqscans instead of index scans and need to read 10s of GB of data per tile. Because of that and because of a world scale view the map seldomly changes, they are rendere infrequently and are probably responsible for those spikes.
Hello, A Postgres/Postgis module is ready at https://gerrit.wikimedia.org/r/#/c/112469/ It is going to be used for the management of the OSM database servers. I am moving on with another one to be used for setting up and maintaining the syncing from planet.osm
Hi! A "perspective" sort of question: If this setup turns out not to perform as expected, is there a chance to get SSDs? According to the OSM folks they have a high impact.
There is no technical obstacle that I know of to using SSDs; the question would be just one of budget I think. I think it's premature to consider hardware improvements until we have some metrics, though.
Checking in to see how things are going Alex
The DB is set up and us maps developers have access. I just requested the OSM coastline data to be imported into the DB as well, as I need it for my application and it was provided on the Toolserver. Once that is done there should be feature parity with the TS OSM DB (we'll have to see about performance).
Daniel was faster :-). As I was saying yesterday at the IRC Office hour meeting, the DB is setup, in a Master/Slave configuration via PostgreSQL asynchronous streaming replication so that we effectively have an Active/Passive High Availability setup (manual switchover). The OSM DB has been created and planet.osm dump has been imported, hstore has been created and populated during the import, a user has been created and informal access has been granted to a few users to find out and fix problems before going public (like the one Daniel mentioned). This week we are going to enable minutely sync with planet.osm, close this ticket and announce it.
thanks Alex. we looks so close to being done with this
Can this be considered done?
Yes it can. Closing.