Last modified: 2013-07-22 08:29:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T21542, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 19542 - Dump page titles for other namespaces
Dump page titles for other namespaces
Status: RESOLVED FIXED
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Ariel T. Glenn
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-07-06 00:58 UTC by Andrew Dunbar
Modified: 2013-07-22 08:29 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Andrew Dunbar 2009-07-06 00:58:33 UTC
Currently the only page titles available separately are namespace 0: all-titles-in-ns0.gz

Apart from this most other titles are available in pages-articles.xml.bz2

Except for User pages and Talk pages, which are available in pages-meta-current.xml.bz2

The articles and meta-current dumps are typically a couple of orders of magnitude larger than the all-titles-in-ns0 dump.

The only ways to get complete lists of page titles are to download and process these two enormous dump files or making excessive use of the API.

* We could dump a page title list to accompany each of pages-articles.xml.bz2 and pages-meta-current.xml.bz2
* We could dump a page title list for all namespaces.
* We could dump a page title list for all pages not already covered by all-titles-in-ns0.gz
* We could dump a page title list for each namespace.

For my current purpose I already need to process pages-articles.xml.bz2 so I only lack page titles for User and Talk pages so a dump of the titles for those namespaces would be enough for me, but might not be the best for other potential users of the data.
Comment 1 Mark A. Hershberger 2011-05-03 18:56:53 UTC
Givng dump bugs to Ariel.
Comment 2 Ariel T. Glenn 2011-11-02 16:37:49 UTC
I'd like to hear from other users of the dumps about what would be most useful.  Starting a thread on wikitech-l and xmldatadumps-l about this.
Comment 3 Platonides 2011-11-02 23:12:29 UTC
All titles are available at page.sql.gz if needed.
Comment 4 Ariel T. Glenn 2011-11-03 17:27:13 UTC
It's not in as convenient a format; I'm assuming that's the reason for the specific request.  However I'd love to hear from the people on the bug (and from other users of the dumps).
Comment 5 Gerrit Notification Bot 2013-06-03 18:38:42 UTC
Related URL: https://gerrit.wikimedia.org/r/66666 (Gerrit Change I7f53f1eb2f4396d6fc9f80625919a6c745bfa21f)
Comment 6 Gerrit Notification Bot 2013-06-03 18:41:55 UTC
https://gerrit.wikimedia.org/r/66666 (Gerrit Change I7f53f1eb2f4396d6fc9f80625919a6c745bfa21f) | change APPROVED and MERGED [by ArielGlenn]
Comment 7 Ariel T. Glenn 2013-07-22 08:29:19 UTC
This is live for some projects and will be live for all after the next deployment.  Closing.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links