Last modified: 2013-07-22 08:29:19 UTC
Currently the only page titles available separately are namespace 0: all-titles-in-ns0.gz Apart from this most other titles are available in pages-articles.xml.bz2 Except for User pages and Talk pages, which are available in pages-meta-current.xml.bz2 The articles and meta-current dumps are typically a couple of orders of magnitude larger than the all-titles-in-ns0 dump. The only ways to get complete lists of page titles are to download and process these two enormous dump files or making excessive use of the API. * We could dump a page title list to accompany each of pages-articles.xml.bz2 and pages-meta-current.xml.bz2 * We could dump a page title list for all namespaces. * We could dump a page title list for all pages not already covered by all-titles-in-ns0.gz * We could dump a page title list for each namespace. For my current purpose I already need to process pages-articles.xml.bz2 so I only lack page titles for User and Talk pages so a dump of the titles for those namespaces would be enough for me, but might not be the best for other potential users of the data.
Givng dump bugs to Ariel.
I'd like to hear from other users of the dumps about what would be most useful. Starting a thread on wikitech-l and xmldatadumps-l about this.
All titles are available at page.sql.gz if needed.
It's not in as convenient a format; I'm assuming that's the reason for the specific request. However I'd love to hear from the people on the bug (and from other users of the dumps).
Related URL: https://gerrit.wikimedia.org/r/66666 (Gerrit Change I7f53f1eb2f4396d6fc9f80625919a6c745bfa21f)
https://gerrit.wikimedia.org/r/66666 (Gerrit Change I7f53f1eb2f4396d6fc9f80625919a6c745bfa21f) | change APPROVED and MERGED [by ArielGlenn]
This is live for some projects and will be live for all after the next deployment. Closing.