Last modified: 2014-09-23 19:31:22 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T35980, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 33980 - (tileserver-prod) Wikimedia-hosted OpenStreetMap (OSM) / mapnik tileservers wanted for mobile usage
(tileserver-prod)
Wikimedia-hosted OpenStreetMap (OSM) / mapnik tileservers wanted for mobile u...
Status: NEW
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: High enhancement with 5 votes (vote)
: ---
Assigned To: Brandon Black
:
Depends on: osm-labs 60831
Blocks: 25139 62257 33856
  Show dependency treegraph
 
Reported: 2012-01-27 01:11 UTC by Brion Vibber
Modified: 2014-09-23 19:31 UTC (History)
26 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Brion Vibber 2012-01-27 01:11:45 UTC
Blocking some releases... we don't want to put too much pressure on the main OSM tile servers.
Comment 1 Brion Vibber 2012-01-27 01:16:41 UTC
If getting a full tile server going is too slow, we may want to check whether a caching proxy the fetches from OSM's main servers would be acceptable, as that might be easier to furnish.
Comment 2 Phil Chang 2012-01-27 23:03:59 UTC
moving to Android 1.1 features, and iOS PhoneGap
Comment 3 Yuvi Panda 2012-02-07 09:48:33 UTC
The betas are getting close, do we have an update on this?
Comment 4 Aude 2012-02-07 12:04:45 UTC
In the interim, you can use mapquest open map tiles for this. Do we have servers available for this.  It's possible perhaps I can help with this
Comment 5 Phil Chang 2012-02-08 05:49:40 UTC
Do you mean hardware? Or space on existing servers? If I need to bug someone in Ops, please let me know. I imagine a similar issue exists for the caching proxy Brion suggested.
Comment 6 Aude 2012-02-08 06:02:18 UTC
I don't think space on existing servers would work well.  For the toolserver, we already have database replication and tile generation / serving on one server and it's strained somewhat.  It's best they be two separate and not share.

In the short term, we can try setting up configurations and testing stuff on wikimedia labs perhaps.
Comment 7 Brion Vibber 2012-02-08 23:49:49 UTC
Per Ryan Lane we're waiting on gluster setup for additional storage before we can set this up on Labs.
Comment 8 Yuvi Panda 2012-02-15 18:16:44 UTC
Bumping this off 1.1 since we're using MapQuest for now.
Comment 9 Sumana Harihareswara 2013-07-29 21:19:51 UTC
Brandon Black is currently working on this; https://www.mediawiki.org/wiki/Mobile_web/Team/Etherpad/WMF_OSM_Hack_Session_2013 has a bunch of OLD thoughts on this from March 2013, and Brandon will be creating a central wiki page to track the requirements and progress.  Thanks, Brandon.
Comment 10 Brandon Black 2013-08-05 16:25:37 UTC
I've uploaded my notes to a wikitech page to cover the ops project for this: https://wikitech.wikimedia.org/wiki/OSM_Tileserver

Probably the most relevant summary bit from that is a tentative initial date around mid-Auguest to get a test machine going with a workable single-machine software stack as a start point.  Let's call it Monday Aug 19th just to be more specific.  Once that's running and functional, we should have a much clearer idea of the real challenges and be able to make a better timeline for production deployment.
Comment 11 Sumana Harihareswara 2013-08-22 20:13:51 UTC
As Brandon investigated how to build this, he ran into some questions about scope and use cases.  He got some answers this week and detailed them on https://wikitech.wikimedia.org/wiki/OSM_Tileserver and now will be able to make better progress.  We don't have a date yet, though. Sorry for the delay.
Comment 12 Firefishy 2013-08-23 22:37:53 UTC
OSM sysadmin team has the tile configuration in Chef:
http://git.openstreetmap.org/chef.git/tree/HEAD:/cookbooks/tile
and
http://git.openstreetmap.org/chef.git/blob/HEAD:/roles/tile.rb

Tilecache config too (very basic):
http://git.openstreetmap.org/chef.git/tree/HEAD:/cookbooks/tilecache
and
http://git.openstreetmap.org/chef.git/blob/HEAD:/roles/tilecache.rb

Disclosure: I am part of the tiny OpenStreetMap sysadmin team. Firefishy in #mediawiki on Freenode or #osm-dev on OFTC.net
Comment 13 Nicolas Raoul 2013-09-11 06:08:26 UTC
Not limited to mobile usage:

The Wikivoyages are now using Mapquest/Cloudmade/OSM/etc for dynamic maps.
They would benefit greatly from a Wikimedia-hosted HTTPS tileserver.

Context: https://en.wikivoyage.org/wiki/Wikivoyage:Dynamic_maps_Expedition
Comment 14 Ken Snider 2013-09-18 03:15:29 UTC
Brandon, can you provide a status update? I think you're close to an initial deploy, correct?

Thanks.
Comment 15 Brandon Black 2013-10-07 17:51:34 UTC
Status update:
  Hardware: boots, using it to stage other work below, will wipe again once we have a final config
  Packaging: we can use upstream Ubuntu packages for many parts (e.g. basic mapnik packages) it seems.  Using a local fork of Kai's packages for renderd/mod_tile + "stylesheet-data" (to get them into our local repo instead of over ppa, and modify defaults/deps as long as we're there to not pull in pgsql or download coastline stuff automatically).
  Render machines: Currently looking at how we'll manage puppetize/deploy of coastline data (~700MB of binary files)...
Comment 16 Matthew Flaschen 2013-10-07 23:41:39 UTC
What is the coastline data generated from?  Can we build it ourselves from source?
Comment 17 kolossos 2013-10-08 17:08:21 UTC
For coastlines you should check this:
http://openstreetmapdata.com/data/coastlines
It seems the easier way for me to use already generated coastlines.
Coastlines are really complex:
http://blog.jochentopf.com/2013-03-11-state-of-the-osm-coastline.html 

Other question: Will we have an hstore in the database? Otherwise we are not flexible enough to render the most important styles, like hikebike-style.
Comment 18 Brandon Black 2013-10-08 17:45:54 UTC
Generating coastline data "from source" is also a bit too complex for what we want to do here.  As far as we're concerned, "source" is the upstream pre-generated stuff.  Kai's openstreetmap-mapnik-stylesheet-data package (here: https://launchpad.net/~kakrueger/+archive/openstreetmap/+packages ) encapsulates (aside from basic style/symbol stuff) the coastline data conceptually, but obviously he doesn't package 688MB of binary data in the .deb.  Instead, it includes a script which (by default) is run at package postinst time to download and unpack these (which is 5 compressed files from two different sites - tile.openstreetmap.org and www.naturalearthdata.com).

When I last talked this over with Faidon, we both agreed we didn't like the idea of downloading huge data from the public internet as part of an automated, puppet-driven package install to set up a render node in the general case.  Ideally we'd have some other fancy local solution to store this data (and update it once in a blue moon as necessary), and we'd rsync it (or whatever) within our infrastructure.  Kai's package allows for this already (via a debconf option to skip the download).

I'm still pondering this bit.  Part of me really just wants to say, "Look, any other solution is even more of a pain in the ass, let's just set up an outbound HTTP proxy on these machines at install time [by default, they don't have access to the outside world, but we have proxies avail...] and let them pull it from the primary upstream sources via the package default."  Then we can move on with other challenges, and if it really bothers some enterprising individual down the road when we have more renderer machines to care about, they're free to remedy the situation at that time.

hstore: I haven't reached a point where I'm fully aware of the tradeoffs on hstore.  If it's relatively cheap to do hstore during osm2pgsql (and ongoing updates), we may as well put it in from the get-go.  On the other hand, this first deployment isn't intended to support things like hikebike.  We're just trying to get basic world map tiles out the door, and then based on production experience with that we can decide what else to support and what that will require in terms of hardware and software resources.
Comment 19 kakrueger 2013-10-08 21:32:34 UTC
The coastline data is derived from two sources. 1) Natural Earth and 2) post-processed OSM data

The natural earth data is a static data set that is probably only updated every couple of years. In the osm stylesheets it, is only used for very low zoom renderings of the entire earth, where precission doesn't matter. I think using natural earth still stems from a time when OSM didn't quite trust its own source of coast lines and didn't want tiny mistakes in the coast line propagating into large issues on a global scale.

For the majority of the zoom levels the style sheets use osm coastline data that is post processed into shape files for better performance during rendering. It generates closed (and partly simplified) polygons out of the individual coastline segments. I think it takes a couple of hours to do the post-processing, as e.g. recreating a closed polygon out of the full resolution European/Asian coastline is a somewhat intense task. The toolset to create these coastline shapefiles are all open, and so it would be possible to build them locally. However, as far as I am aware no one has ever complained that the load from downloading the coastline files has been excessive. So I wouldn't worry for now if you pull those files from the upstream locations. Particularly not if you run it through a caching proxy (that caches such large files).


That said, the current main OSM style sheet toolchain has changed somewhat in the  past month ( https://github.com/gravitystorm/openstreetmap-carto ). Instead of writing the unwieldy mapnik style sheet xml by hand, they have moved over to a CSS based map style language (carto) that is then compiled (by a node.js based compiler) into the actual mapnik style sheet xml. In this process the style sheet has also moved over to a slightly different set of coast line files, which uses a more efficient tool-chain to creating those shapefiles. Those files now live on http://data.openstreetmapdata.com/ and the script https://github.com/gravitystorm/openstreetmap-carto/blob/master/get-shapefiles.sh downloads all the relevant files.

I haven't yet updated my PPAs to use the new style-sheet, but I am hoping to do that soon. I'll try and do it sometime this week. But as I intend to put the pre-compiled mapnik xml style sheet into the packages rather than the carto source (to avoid the node.js dependencies) nothing much should change from a tile server admin's point of view.
Comment 20 kakrueger 2013-10-08 21:44:56 UTC
Regarding the PPAs, the source debian scripts live in https://github.com/apmon/OSM-rendering-stack-deplou Those also include some debian packaging scripts I specifically created for wikipedia back in March, that take out some of the "fancy" post-install scripts that try and set things up. With puppet those install scripts wouldn't be necessary, and aren't directly "debian standards compliant". However, obviously feel free to modify them in any way you need. Or if there are issues let me know and I can hopefully try and fix anything as necessary.

You probably don't really want to use my PPAs directly anyway, as I have taken a policy to freeze the packages together with the respective ubuntu distributions. So the packages for 12.04LTS are more or less still from 1 1/2 years ago and quite a bit has changed in the software since then.

For osm2pgsql a more or less up-to-date version is in debian unstable, but hasn't migrated to ubuntu yet. With mod_tile / renderd I have always intended to get packages for them into the official debian / ubuntu repositories, but haven't so far. Perhaps with Faidon's (or someone elses) help with cleaning them up, we could have another attempt at getting them in.
Comment 21 kakrueger 2013-10-08 21:56:50 UTC
Regarding hstore: I don't have any hard numbers of overhead at the moment, but I don't think it is a significant or relevant overhead.

Configuration wise, activating hstore is done with a single command line switch to osm2pgsql. I believe my PPA packages already install the hstore extension into postgresql automatically.

Performance wise during import / diff processing the overhead should be minimal as that is not really where the bottleneck is.

Sizewize I think the overhead is somewhere on the order of 100GB if I am not mistaken. However, given that the non hstore db is just over 256GB and the hstore one is below 512GB, chances are that overhead is not really much concern.

Furthermore there is some talk in moving the main osm style-sheet to (partly) use hstore as well. It would probably keep all the data in postgres columns for which there are where clauses on the rendering sql selection filteres and move the rest into hstore. So far this hasn't happened, as the performance impact during rendering isn't well known yet. But with pressure to include data in the map rendering for which the current schema doesn't have columns, I would think it is likely to happen at some point in the not too distant future.

So I would recommend to activate it just to be flexible and prepared for future changes without having to re-import everything again. From what I have seen, there are little downsides and several benefits. But as a fresh import wouldn't take all that long, it probably isn't directly critical either.
Comment 22 Erik Moeller 2013-10-18 02:33:11 UTC
Brandon, Ken et al. - Is it possible to state a target date for getting a first version into production?
Comment 23 Brandon Black 2013-10-25 14:56:25 UTC
I think a reasonably solid date for a prod-accessible installation would be Nov 4th.  From there we'll have to take our time adding various prod traffic sources and seeing how things scale.
Comment 24 kolossos 2013-11-06 19:32:17 UTC
4th November, which year? 
Sorry, it is frustrating to hear from WMF one false prediction after another. 
In March in Copenhagen WMF engineers told me that something should be ready 2 later, the same answer in May in Amsterdam, than in Hongkong... now we have November. I feel that WMF is wasting my time and it's hard for me to see a future for maps in Wikipedia. 
Are professional project management and realistic communition really so difficult?
Comment 25 Brandon Black 2013-11-06 20:15:08 UTC
The date was obviously not solid, and work is still in progress.  I apologize for your wasted time and effort.
Comment 26 kakrueger 2013-11-06 22:07:31 UTC
Can you describe a bit more where the issues lie? Many people have set up tileservers and might be able to help with finding solutions or best practices advice.
Comment 27 Erik Moeller 2013-11-07 00:32:52 UTC
From an email by Brandon just now:

"Things are always a little more complicated than I expect them to be.  It's one thing to build a one-off tileserver and make it work, it's another to structure it to be automated, deployable, and manageable on an infrastructure, and I've been distracted by other shorter-term tasks frequently.  I still think I'm very close to being done with this.  We have hardware up, it's booted/installed, we have packages that will work for our deployment style (with some compromise made).  I'm restarting the initial database import (osm2pgsql of the full planet data) later today, and that tends to take quite a while to run (on the order of ~24h? I've made some changes since the last attempt that may affect it)."

Since it's evidently hard to provide estimates when it'll be done, regular updates are much appreciated.
Comment 28 kakrueger 2013-11-07 17:34:32 UTC
Osm2pgsql does generally take a fairly long time to import and it is very hardware dependent. On powerful hardware (database on SSDs, sufficient ram and a reasonable multi-core CPU), 24 hours does sound a little on the long side though, and indicates that there may still be some room for optimising the setup. The fastest imports I have seen are on the order of 8 - 10 hours.

Are the parameters for postgresql and the command-line options of osm2pgsql available in one of the public puppet repositories somewhere?
Comment 29 Silke Meyer (WMDE) 2013-11-11 08:49:19 UTC
Hi Brandon,

why don't you involve Kai and Tim? They are very experienced and are offering their help. To me, it sounds like a very good plan to bring them in. What would have to done to do so?
Comment 30 Andre Klapper 2013-11-22 15:38:49 UTC
Brandon: Any chance to answer comment 29?
Comment 31 Ken Snider 2013-11-29 02:36:43 UTC
All,

Brandon has an update here: https://wikitech.wikimedia.org/wiki/OSM_Tileserver#Extended_State_of_Things_-_2013-11-27

At this point, we're going to re-evaluate our options, and see if we can get some more eyeballs on the project overall. The scope is much larger than originally anticipated.

We're going to have a chat immediately after the holidays to chart a new way forward on the project internally, with a goal of getting a bare-bones setup that meets the exiting and proposed requirements, and work up from there. This will probably involve some more scope conversations, as well as some help from Kai, Tim (and indeed anyone else!) who might be able to assist in us getting this project to 1.0.

Thanks!

--Ken.
Comment 32 Tim Landscheidt 2013-11-29 05:06:26 UTC
(In reply to comment #31)
> [...]
> We're going to have a chat immediately after the holidays to chart a new way
> forward on the project internally, with a goal of getting a bare-bones setup
> that meets the exiting and proposed requirements, and work up from there.
> This
> will probably involve some more scope conversations, as well as some help
> from
> Kai, Tim (and indeed anyone else!) who might be able to assist in us getting
> this project to 1.0.

(I assume "holidays" here means US thanksgiving and today "weekend".)  As some of the delays seem to stem from "internal project", how about hiring/funding someone knowledgeable about the current Toolserver (or another tileserver) setup and assisting *them* in setting it up at WMF instead of the other way around which appears to be *far* more laborious?

Quotes like (emphasis added): "The *best* way to scale would be to render the whole database to vector tiles.  The software setup for this is an *unknown* to me (and I'm guessing most who aren't directly involved), [...]" make me cringe.  Let's leave optimizing the code to people who actually know it.  A *lot* of hardware can be bought before we need to throw away running code.
Comment 33 kakrueger 2013-12-01 02:56:25 UTC
What is the best way to comment on the individual issues raised by Brandon? Inline comments in the wikipage? Discussing them here on the ticket, or as comments at the bottom of the wikipage?

The two main issues I'd like to address though are:

1) Is there a realistic estimate of the load this system needs to actually support? On the wiki page, the proposed load numbers range from 200 tiles per second to 30000 tiles per second. It is hard to plan a system if the estimates of what needs to be supported varies over two orders of magnitude. The 200 tiles per second estimate comes from the empirical numbers we are seeing on the toolserver which is running the "in production" system for the OSM gadget which is embedded in at least the German, Italian, Spanish and a bunch of other Wikipedias and from the WMA, which is embedded in most of the "Desktop" Wikipedias. I have no idea where the estimate of 30000 tiles/s comes from. As a comparison all of osmf's tile serving infrastructure is on the order of 3000 - 4000 tile/s including everywhere where the osmf tiles are embedded in other peoples site.

So it seems like we really need a better and more justified estimate of the expected load. I.e. looking through the logs of the existing systems, both for mobile (mapquest open) and desktop (toolserver) and figure out what access patterns the pages have, where these tiles would be used in.

My impression is that we are trying to build a system that is potentially orders of magnitude more potent than what is actually needed in the beginning.

One has to, however, also remember, that there are two types of scaling in a typical tile server that are semi-independent and scale in different ways and have rather different hardware demands: Scaling of serving of tiles and scaling of rendering. Although rendering does to some degree scale with serving load, to a good degree, it also scales with update frequency and editing frequency in OSM, both of which are independent of the size of the serving site.

2) "direct PostgreSQL queries for rendering aren't going to scale well". Although it depends on what exactly you mean, the OSM rendering stack has proved in countless installations that it does scale pretty well. And really, the system is the only way that is possible if you want an up-to-date map at high zoom levels unless you throw a gigantic amount of hardware at it. And even then, you wont be able to achieve the minutely updated map up to Z20, as you can with the current OSM software stack without too much issues. It is also follows the concept that is used in pretty much any other scaleable website (including mediawiki). I.e. that you have a bunch of application servers in the background that render pages from a database on request and then have multiple levels of caching in front of that to achieve the necessary performance. That is exactly what mod_tile / renderd do as well and only a tiny fraction of requests actually need to be rendered on the fly.

E.g. looking at the setup on osm.org. Out of the 3000 tiles/s served, typically only somewhere between 5 - 10 tiles/s are rendered on the fly. As most of those are updates, only about 1 tile/s is not actually available in the master cache (of about 1.5TB) and needs to be rendered on the fly. As nearly all of those are high zoom tiles, they typically can be rendered in a few 100 milliseconds and so a single multi-core server is usually quite capable of rendering 5 - 10 tiles/s. The master cache can also vary smoothly between 0 and the full set of all tiles depending on where you want to put the trade off of between disk space and rendering capacity. Empirically 1 - 2 TB have generally proven as fairly good trade-offs. 

Furthermore, mod_tile / renderd are designed to very gracefully degrade once they can't keep up with the rendering load anymore and simply some of the areas won't be quite fully up-to-date for the duration of the overload. But pretty much no one will notice if it shows "yesterday's" map instead of "today's" map.

Once you move over to trying to display many different map styles, things start to look differently, as in the current setups you need to more or less "duplicate" things per style. At that point you probably do want to move over to "vector tiles" which get styled and rendered either on the client side, or potentially on the fly server side for weak dumb clients. The software stack for that is indeed not yet well developed and tested. But for a single style, I would consider the current software stack as fairly mature, robust and scalable.
Comment 34 Andre Klapper 2014-01-02 11:10:48 UTC
bblack: Are there any new updates on the status? 
Could you reply to comment 33?
Comment 35 Ken Snider 2014-01-02 13:55:50 UTC
Andre,

Brandon may be able to respond to the individual items mentioned, however, in terms of expected load, I believe Tomasz is looking to get some focus on this specific question (as well as scope of requested features) from a team within engineering, which will help tremendously with several of these questions. 

On the Operations side, Alex has agreed to have an initial look at the infrastructure with an eye on how to move forward. At minimum, we need a base tileserver, and to replicate the currently utilized components of the toolserver install. He'll continue to look into this in-between other projects while we wait for clearer direction on overall scope and scale. 

Thanks.
Comment 36 Tim Landscheidt 2014-01-27 03:21:28 UTC
The result of the office hour on the Toolserver migration (cf. [[m:IRC office hours/Office hours 2014-01-23]]) was that in an effort to not have grander plans block the Toolserver migration, Alexandros will set up replication from OSM to the (currently unused) Labs PostgreSQL server in Ashburn.  I have created bug #60461 to track this.

The idea is that, after this, the existing Toolserver OSM tools have access to the database and can be migrated, tile server(s) in similar scope to the existing Toolserver instance(s) can be set up and tested in Labs, and performance data/puppetization/etc. can be gathered to inform decisions on production tile servers that do not have deadlines to meet.

Coren estimated that the move of Labs to Ashburn that is a (soft) requirement for accessing the PostgreSQL server will be completed by mid-March.  This will leave volunteers with about three months to migrate the tools in their spare time before Toolserver will be decommissioned in July.
Comment 37 kakrueger 2014-05-11 17:33:36 UTC
Just a brief update on the situation of the tileserver on the labs infrastructure.

With the openstreetmap replicated database accessible from labs up and running, we have begun setting up and testing the tile rendering infrastructure in labs.

There is a demonstration map accessible at http://a.tiles.wmflabs.org/osm/slippymap.html

and the tiles can be accessed (for the default osm style) under the URL http://a.tiles.wmflabs.org/osm/0/0/0.png.

At the moment only the default openstreetmap style (and a demonstration multi-lingual style) is up and running on labs. The various different styles that were on the toolserver have not yet been migrated. But the technical infrastructure should hopefully already be able to support that and so once style maintainers update their styles to mapnik 2.2, we should be able to activate them in the labs infrastructure.

Tile expiry and updating is also not yet enabled, but that is probably next on the todo list.

Overall not all things haven't been finalized and are still subject to change.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links