Last modified: 2011-11-29 03:21:01 UTC
The all-titles-in-ns0 list for enwiki: http://download.wikimedia.org/enwiki/latest/enwiki-latest-all-titles-in-ns0.gz contains some weird stuff: AC\\DC_Lane,_Melbourne A_ch\\'im_un_pinnara,_i_kangsan_ungum_e Bill_Clinton\\ C:\\WINDOWS E\\I . . . It's mainly an escaping issue, as it seems [e.g.: \' etc.]. Caused by maintenance script(s): Broken//\\x2e Broken/File\\x3a Broken/S/\\x2e Broken/\\xe2\\x80\\xad Broken/\\xe2\\x80\\xae Broken/Norsk_(bokmål) Broken/Norsk_(nynorsk)
Assigning to Tomasz for dumps stuff...
Confirmed I can see this happening on any article that has a '\' in its title. Thus Dreamworks\\madagascar fails while Dreamworks\madagascar is valid This his happening because the 'mysql' command is escaping any slashes with an extra slash. '-r' fixes this by placing mysql in raw mode which doesn't add the extra slash. I've checked in the fix under http://www.mediawiki.org/wiki/Special:Code/MediaWiki/50570 and updated the main files. Any future dump should no longer have this problem.