Last modified: 2011-11-29 03:21:01 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T20414, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 18414 - The all-titles-in-ns0 list for enwiki contains some weird stuff
The all-titles-in-ns0 list for enwiki contains some weird stuff
Status: RESOLVED FIXED
Product: Datasets
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement with 1 vote (vote)
: ---
Assigned To: Tomasz Finc
http://download.wikimedia.org/enwiki/...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-09 08:53 UTC by Melancholie
Modified: 2011-11-29 03:21 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Melancholie 2009-04-09 08:53:06 UTC
The all-titles-in-ns0 list for enwiki:
http://download.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz

contains some weird stuff:

AC\\DC_Lane,_Melbourne
A_ch\\'im_un_pinnara,_i_kangsan_ungum_e
Bill_Clinton\\
C:\\WINDOWS
E\\I
.
.
.

It's mainly an escaping issue, as it seems [e.g.: \' etc.].
Caused by maintenance script(s):

Broken//\\x2e
Broken/File\\x3a
Broken/S/\\x2e
Broken/\\xe2\\x80\\xad
Broken/\\xe2\\x80\\xae
Broken/Norsk_(bokmål)
Broken/Norsk_(nynorsk)
Comment 1 Brion Vibber 2009-04-09 15:09:51 UTC
Assigning to Tomasz for dumps stuff...
Comment 2 Tomasz Finc 2009-05-13 22:38:00 UTC
Confirmed I can see this happening on any article that has a '\' in its title. Thus

Dreamworks\\madagascar fails while
Dreamworks\madagascar is valid

This his happening because the 'mysql' command is escaping any slashes with an extra slash. '-r' fixes this by placing mysql in raw mode which doesn't add the extra slash. I've checked in the fix under http://www.mediawiki.org/wiki/Special:Code/MediaWiki/50570 and updated the main files. Any future dump should no longer have this problem.




Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links