Last modified: 2011-05-15 09:50:28 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T22741, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 20741 - Run cleanupTitles.php and cleanupImages.php on any wikis with unicode whitespace in page/file names
Run cleanupTitles.php and cleanupImages.php on any wikis with unicode whitesp...
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
unspecified
All All
: Normal critical with 2 votes (vote)
: ---
Assigned To: Priyanka Dhanda
http://svn.wikimedia.org/viewvc/media...
: code-update-regression, shell
: 20703 20738 20746 20747 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-09-19 15:57 UTC by Splarka
Modified: 2011-05-15 09:50 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Dry run results from maintenance/cleanupTitles.php (172.23 KB, text/plain)
2011-04-12 18:34 UTC, Priyanka Dhanda
Details
Dry run results from maintenance/cleanupImages.php (2.88 KB, text/plain)
2011-04-12 21:21 UTC, Priyanka Dhanda
Details
Results after running maintenance/cleanupTitiles on all wikis in all.dblist (159.87 KB, text/plain)
2011-04-19 20:13 UTC, Priyanka Dhanda
Details
Results after running maintenance/cleanupImages on all wikis in all.dblist (2.44 KB, text/plain)
2011-04-19 20:14 UTC, Priyanka Dhanda
Details

Description Splarka 2009-09-19 15:57:16 UTC
Per r55382 (ref bug 15248 that introduced the change) there are now many pages and files inaccessable on some projects that contain these characters.

Example: http://toolserver.org/~nikola/grep.php?pattern=%E3%80%80&lang=ja&wiki=wikipedia&ns=6

http://ja.wikipedia.org/wiki/%E3%83%95%E3%82%A1%E3%82%A4%E3%83%AB:110%E3%80%80PICT0001.JPG
becomes
http://ja.wikipedia.org/wiki/%E3%83%95%E3%82%A1%E3%82%A4%E3%83%AB:110_PICT0001.JPG
Comment 1 Splarka 2009-09-19 16:05:11 UTC
more examples, NBSP in en.wp titles (12 at this time): 
http://toolserver.org/~nikola/grep.php?pattern=%C2%A0&lang=en&wiki=wikipedia&ns=0
Comment 2 Splarka 2009-09-20 08:42:26 UTC
*** Bug 20738 has been marked as a duplicate of this bug. ***
Comment 3 Victor Vasiliev 2009-09-20 08:56:02 UTC
Change to critical since it's related to several data made inaccessible.
Comment 4 Chad H. 2009-09-20 14:09:21 UTC
*** Bug 20746 has been marked as a duplicate of this bug. ***
Comment 5 Chad H. 2009-09-20 14:10:49 UTC
Upping to BLOCKER. Lots of pages on frwikinews are unavailable: http://toolserver.org/~nikola/grep.php?pattern=%C2%A0&lang=fr&wiki=wikinews&ns=0

Adding Brion as CC so he can delegate someone for this.
Comment 6 Splarka 2009-09-20 14:31:50 UTC
*** Bug 20747 has been marked as a duplicate of this bug. ***
Comment 8 Melancholie 2009-09-21 08:51:35 UTC
*** Bug 20703 has been marked as a duplicate of this bug. ***
Comment 9 Andrew Garrett 2009-09-21 11:37:49 UTC
The script is running in a screen on zwinger.
Comment 10 Andrew Garrett 2009-09-21 21:02:19 UTC
Done, invalid titles can be found with Special:PrefixIndex/Broken/
Comment 11 Bertrand GRONDIN 2009-09-21 22:17:04 UTC
(In reply to comment #10)
> Done, invalid titles can be found with Special:PrefixIndex/Broken/
> 

No, invalid titles can'be be found with Special:PrefixIndex/Broken/

See fr-wikinews : five hundred page are still unavailable : 10 % of the project is canceled.

Histories are canceled, too ! It's not a bug, but a disaster.
Comment 12 Bertrand GRONDIN 2009-09-21 22:22:41 UTC
This is the list of broken pages: http://toolserver.org/~nikola/grep.php?pattern=%C2%A0&lang=en&wiki=wikipedia&ns=0

How we'll do to restore them ?
Comment 13 Andrew Garrett 2009-09-21 22:30:51 UTC
Looks like the script is stopping after fixing one title. Looking...
Comment 14 Chad H. 2009-09-22 00:49:49 UTC
(In reply to comment #13)
> Looks like the script is stopping after fixing one title. Looking...
> 

Cf bug 17479. Maybe a problem with TableCleanup itself?
Comment 15 Brion Vibber 2009-09-25 16:49:19 UTC
Pending on fixes to cleanupTitles/namespaceDupes per Tim.
Comment 16 zephyrus4 2009-09-29 18:04:25 UTC
On French wikisource this page [[Discussion:auteur:Charles Baudelaire]] has disappeared: [http://fr.wikisource.org/w/index.php?title=Discussion:Charles_Baudelaire&action=history the history gives this].

The cache in Google is http://209.85.229.132/search?q=cache:92baBBRt9lsJ:fr.wikisource.org/wiki/Discussion:Broken/Auteur%255Cx3aCharles_Baudelaire+wikisource+discussion+auteur+baudelaire&cd=1&hl=en&ct=clnk&client=firefox-a

When I ask for this: http://toolserver.org/~nikola/grep.php?pattern=Charles+Baudelaire&lang=fr&wiki=wikisource&ns=1

I get an answer

Auteur:Charles Baudelaire

but the link to Talk:Auteur:Charles Baudelaire is not good, I have this message: 

Mauvais titre

Le titre de la page demandée est invalide, vide, ou il s’agit d’un titre inter-langue ou inter-projet mal lié. Il contient peut-être un ou plusieurs caractères qui ne peuvent pas être utilisés dans les titres.

(Bad title)

This page represented many hours of work, is it possible to have it back?

Zeph
Comment 17 zephyrus4 2009-10-15 01:01:38 UTC
The same thing happened to another author talk page: Émile Zola but I was more lucky this time because I renamed the broken page like this:

15 octobre 2009 à 00:20 Zyephyrus (discuter | contributions | bloquer) m (5 265 octets) (Discussion:Broken/Auteur\x3a\xc3\x89mile Zola renommé en Discussion Auteur:Émile Zola) (révoquer | défaire) 

Unfortunately I can't find again the previous broken page: Discussion auteur Charles Baudelaire. Is there some way to find it?

Zeph
Comment 19 Roan Kattouw 2009-12-13 19:10:25 UTC
I just ran cleanupTitles on frwikisource and it fixed nothing, and the linked list is empty. Please REOPEN if you still find issues.
Comment 20 Roan Kattouw 2009-12-13 19:17:36 UTC
As Natalie pointed out on IRC, this bug requests that cleanupTitles be run against all wikis, which I obviously didn't do yet. This should probably happen during office hours.
Comment 21 Mark A. Hershberger 2011-04-04 22:31:10 UTC
Pdhanda volunteered to handle this one.
Comment 22 Priyanka Dhanda 2011-04-12 18:34:45 UTC
Created attachment 8396 [details]
Dry run results from maintenance/cleanupTitles.php

This took a few hours to run. So I'll start the actual run early morning PDT on 04/14.
Comment 23 Priyanka Dhanda 2011-04-12 21:21:35 UTC
Created attachment 8401 [details]
Dry run results from maintenance/cleanupImages.php
Comment 24 Priyanka Dhanda 2011-04-19 20:13:27 UTC
Created attachment 8430 [details]
Results after running maintenance/cleanupTitiles on all wikis in all.dblist
Comment 25 Priyanka Dhanda 2011-04-19 20:14:01 UTC
Created attachment 8431 [details]
Results after running maintenance/cleanupImages on all wikis in all.dblist
Comment 26 Priyanka Dhanda 2011-04-19 20:14:46 UTC
Done for all wikis in all.dblist

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links