Last modified: 2013-06-18 16:32:38 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 13693 - Dump the article titles lists (all-titles-in-ns0.gz) every day
Dump the article titles lists (all-titles-in-ns0.gz) every day
Status: RESOLVED FIXED
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement with 1 vote (vote)
: ---
Assigned To: Ariel T. Glenn
http://download.wikimedia.org/enwiki/...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-11 08:28 UTC by Melancholie
Modified: 2013-06-18 16:32 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Melancholie 2008-04-11 08:28:14 UTC
The dump process loops in an periode of roughly 2-3 weeks for smaller wikis and roughly 1-2 months for big ones like [[en:]]!

This means that the very helpful all-titles-in-ns0 lists can be up to 2 months old, totally outdated!

For enwiki it takes less than 30 seconds (26 sec in March) to dump this list, according to http://download.wikimedia.org/enwiki/20080312/

For all wikis this would mean about 1 minute or so. Is it possible to dump the all-titles-in-ns0 lists on a daily basis?

It would be very helpful to be able to analyse db dumps with up-to-date lists! Another reason is a potential feature in analysing User:Midom's stats, see [[User_talk:Henrik#Most_popular_nonexistent_articles.3F]].
Comment 1 Mathias Schindler 2008-06-05 16:34:11 UTC
Please note that you can access this list via http://download.wikipedia.org/sitemap/
Comment 2 Melancholie 2008-06-05 16:47:36 UTC
?
The sitemaps even have not been updated since 2007-Dec-27 ;-)
Comment 3 Andrew Dunbar 2008-06-14 17:32:00 UTC
This bug was recently made dependant on Bug 14415 -- Dump the article titles lists (all-titles-in-ns0.gz) unsorted

Was this an error? Is making this file available more frequently really dependent on the sort order the file utilizes?

I'm being bold and removing the dependency but please restore it if the link was in fact intentional.
Comment 4 Tomasz Finc 2009-05-13 23:54:13 UTC
Dumping this (In reply to comment #0)
> The dump process loops in an periode of roughly 2-3 weeks for smaller wikis and
> roughly 1-2 months for big ones like [[en:]]!
> 
> This means that the very helpful all-titles-in-ns0 lists can be up to 2 months
> old, totally outdated!
> 
> For enwiki it takes less than 30 seconds (26 sec in March) to dump this list,
> according to http://download.wikimedia.org/enwiki/20080312/
> 
> For all wikis this would mean about 1 minute or so. Is it possible to dump the
> all-titles-in-ns0 lists on a daily basis?
> 
> It would be very helpful to be able to analyse db dumps with up-to-date lists!
> Another reason is a potential feature in analysing User:Midom's stats, see
> [[User_talk:Henrik#Most_popular_nonexistent_articles.3F]].
> 

Apologies for the really late pickup on this but were just now moving through all the data dump issues. Think you could
elaborate a bit on what your daily use case is? 

We haven't received too many requests for a daily list and thus are more thinking of having this made available in two week durations. How does that fit into your use cases?
Comment 5 Melancholie 2009-05-14 12:07:47 UTC
It's just that the list for enwiki currently can be months old, making it not very usable when handling live content (e.g. pywikipedia bot etc.) A regular two weeks scheme would be much much better of course (if it's really regular, so not intermitted). But the coolest thing would be to have up-to-date title lists, to not be forced to use the API for this (reverting bots, stats, missing articles etc.).
Comment 6 Mark A. Hershberger 2011-05-03 18:57:03 UTC
Givng dump bugs to Ariel.
Comment 7 Ariel T. Glenn 2012-06-20 10:12:54 UTC
http://dumps.wikimedia.org/other/pagetitles/

These will be dumped on a daily basis. We don't plan to keep them forever, maybe about 30 days' worth before they get tossed. Enjoy.

Also p.s. we might move the location around depending on if there are more daily things that get dumped.

And a final p.s., anyone not on the xmldatadumps-l list should get on it because announcements and discussions happen there which could affect users of the dumps.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links