Last modified: 2013-06-18 16:32:38 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T15693, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 13693 - Dump the article titles lists (all-titles-in-ns0.gz) every day
Dump the article titles lists (all-titles-in-ns0.gz) every day
Status: RESOLVED FIXED
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement with 1 vote (vote)
: ---
Assigned To: Ariel T. Glenn
http://download.wikimedia.org/enwiki/...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-04-11 08:28 UTC by Melancholie
Modified: 2013-06-18 16:32 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Melancholie 2008-04-11 08:28:14 UTC
The dump process loops in an periode of roughly 2-3 weeks for smaller wikis and roughly 1-2 months for big ones like [[en:]]!

This means that the very helpful all-titles-in-ns0 lists can be up to 2 months old, totally outdated!

For enwiki it takes less than 30 seconds (26 sec in March) to dump this list, according to http://download.wikimedia.org/enwiki/20080312/

For all wikis this would mean about 1 minute or so. Is it possible to dump the all-titles-in-ns0 lists on a daily basis?

It would be very helpful to be able to analyse db dumps with up-to-date lists! Another reason is a potential feature in analysing User:Midom's stats, see [[User_talk:Henrik#Most_popular_nonexistent_articles.3F]].
Comment 1 Mathias Schindler 2008-06-05 16:34:11 UTC
Please note that you can access this list via http://download.wikipedia.org/sitemap/
Comment 2 Melancholie 2008-06-05 16:47:36 UTC
?
The sitemaps even have not been updated since 2007-Dec-27 ;-)
Comment 3 Andrew Dunbar 2008-06-14 17:32:00 UTC
This bug was recently made dependant on Bug 14415 -- Dump the article titles lists (all-titles-in-ns0.gz) unsorted

Was this an error? Is making this file available more frequently really dependent on the sort order the file utilizes?

I'm being bold and removing the dependency but please restore it if the link was in fact intentional.
Comment 4 Tomasz Finc 2009-05-13 23:54:13 UTC
Dumping this (In reply to comment #0)
> The dump process loops in an periode of roughly 2-3 weeks for smaller wikis and
> roughly 1-2 months for big ones like [[en:]]!
> 
> This means that the very helpful all-titles-in-ns0 lists can be up to 2 months
> old, totally outdated!
> 
> For enwiki it takes less than 30 seconds (26 sec in March) to dump this list,
> according to http://download.wikimedia.org/enwiki/20080312/
> 
> For all wikis this would mean about 1 minute or so. Is it possible to dump the
> all-titles-in-ns0 lists on a daily basis?
> 
> It would be very helpful to be able to analyse db dumps with up-to-date lists!
> Another reason is a potential feature in analysing User:Midom's stats, see
> [[User_talk:Henrik#Most_popular_nonexistent_articles.3F]].
> 

Apologies for the really late pickup on this but were just now moving through all the data dump issues. Think you could
elaborate a bit on what your daily use case is? 

We haven't received too many requests for a daily list and thus are more thinking of having this made available in two week durations. How does that fit into your use cases?
Comment 5 Melancholie 2009-05-14 12:07:47 UTC
It's just that the list for enwiki currently can be months old, making it not very usable when handling live content (e.g. pywikipedia bot etc.) A regular two weeks scheme would be much much better of course (if it's really regular, so not intermitted). But the coolest thing would be to have up-to-date title lists, to not be forced to use the API for this (reverting bots, stats, missing articles etc.).
Comment 6 Mark A. Hershberger 2011-05-03 18:57:03 UTC
Givng dump bugs to Ariel.
Comment 7 Ariel T. Glenn 2012-06-20 10:12:54 UTC
http://dumps.wikimedia.org/other/pagetitles/

These will be dumped on a daily basis. We don't plan to keep them forever, maybe about 30 days' worth before they get tossed. Enjoy.

Also p.s. we might move the location around depending on if there are more daily things that get dumped.

And a final p.s., anyone not on the xmldatadumps-l list should get on it because announcements and discussions happen there which could affect users of the dumps.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links