Last modified: 2014-02-16 05:55:46 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 4312 - Username of all whitespaces in German Wikipedia dump file
Username of all whitespaces in German Wikipedia dump file
Status: REOPENED
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Low normal (vote)
: ---
Assigned To: Nobody - You can work on this!
http://de.wikipedia.org/w/index.php?t...
:
Depends on:
Blocks: 16660
  Show dependency treegraph
 
Reported: 2005-12-19 05:15 UTC by Tyler Riddle
Modified: 2014-02-16 05:55 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Tyler Riddle 2005-12-19 05:15:30 UTC
A username consisting of all spaces made its way into the German Wikipedia dump file. The article it 
happened on is at http://de.wikipedia.org/w/index.php?title=Negativ-Positiv_Verfahren&action=history

Since the username field is not marked as space-preserving Parse::MediaWikiDump completely ignored 
its contents in this case. I have a feeling a username of all spaces is not supposed to be allowed to exist.

Tyler
Comment 1 lɛʁi לערי ריינהארט 2005-12-19 06:36:11 UTC
Hallo!

If you go
to
http://de.wikipedia.org/w/index.php?title=Negativ-Positiv_Verfahren&action=history
and click on the "space" link
you will come to
http://de.wikipedia.org/wiki/Benutzer_Diskussion:%C2%A0
there to
http://de.wikipedia.org/wiki/Spezial:Contributions/%C2%A0
no email specified or emails from other users disabeled

The problem is known since August see
http://de.wikipedia.org/wiki/Benutzer_Diskussion:%C2%A0

The user name contains
Unicode Character 'NO-BREAK SPACE - U+00A0
http://www.fileformat.info/info/unicode/char/00a0/index.htm
HTML Entity (decimal)   (hex)   (named)  
UTF-8 (hex) 0xC2 0xA0 (c2a0) %c2%a0 %C2%A0

http://en.wikipedia.org/wiki/User:%C2%A0
is known already from
http://bugzilla.wikimedia.org/show_bug.cgi?id=1524#c9

Changing the name would be an administrative task either at WP:DE or better at
all projects. I do not know the policy about this. Please clarify this at the
local wiki, via a mailing list as [Wikide-l], [Wikitech-l] etc. or via IRC at
irc://irc.freenode.net/mediawiki .

Marking this bug as a duplicate of
bug 1524: usernames should use unicode whitelist

http://fr.wikipedia.org/wiki/%C2%A0 is mentioned at
bug 2173 comment 3
bug 2173: Fatal error when removing an article with an whitespace title from the
watchlist

best regards reinhardt [[user:gangleri]]

*** This bug has been marked as a duplicate of 1524 ***
Comment 2 Ævar Arnfjörð Bjarmason 2005-12-19 06:53:04 UTC
This isn't a duplicate of bug 1524, that deals with having a whitelist for
registered usernames, but this particular username also happens to break the XML
schema.
Comment 3 lɛʁi לערי ריינהארט 2005-12-19 06:58:23 UTC
Thanks Ævar! I did not read the second paragraph with the attention that would
be required. Please look what happens at
http://en.wikipedia.org/wiki/User:%C2%A0
and
http://fr.wikipedia.org/wiki/%C2%A0

Please change the summary in order to reflect the new / major problem Thanks in
advance!
Comment 4 Nemo 2012-08-23 19:08:41 UTC
I don't understand, does this really break dumps?
Comment 5 Andre Klapper 2012-11-10 15:53:53 UTC
Also wondering. How to exactly reproduce that it "breaks dumps"?
Comment 6 Tyler Riddle 2012-11-11 15:54:45 UTC
If the XML schema indicates data is not white space preserving then white space is not significant and there is no difference between " ", "  ", "   ", "\t\n\n\n\t\t\t\t\t\t\t\t\t \n\n]n" etc.

If a user name exists where white space is significant it becomes impossible to transmit using a non-space preserving data type. Thus it's not actually possible to get the user names correctly and this is rather broken.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links