Last modified: 2014-02-16 05:55:46 UTC
A username consisting of all spaces made its way into the German Wikipedia dump file. The article it
happened on is at http://de.wikipedia.org/w/index.php?title=Negativ-Positiv_Verfahren&action=history
Since the username field is not marked as space-preserving Parse::MediaWikiDump completely ignored
its contents in this case. I have a feeling a username of all spaces is not supposed to be allowed to exist.
If you go
and click on the "space" link
you will come to
no email specified or emails from other users disabeled
The problem is known since August see
The user name contains
Unicode Character 'NO-BREAK SPACE - U+00A0
HTML Entity (decimal)   (hex)   (named)
UTF-8 (hex) 0xC2 0xA0 (c2a0) %c2%a0 %C2%A0
is known already from
Changing the name would be an administrative task either at WP:DE or better at
all projects. I do not know the policy about this. Please clarify this at the
local wiki, via a mailing list as [Wikide-l], [Wikitech-l] etc. or via IRC at
Marking this bug as a duplicate of
bug 1524: usernames should use unicode whitelist
http://fr.wikipedia.org/wiki/%C2%A0 is mentioned at
bug 2173 comment 3
bug 2173: Fatal error when removing an article with an whitespace title from the
best regards reinhardt [[user:gangleri]]
*** This bug has been marked as a duplicate of 1524 ***
This isn't a duplicate of bug 1524, that deals with having a whitelist for
registered usernames, but this particular username also happens to break the XML
Thanks Ævar! I did not read the second paragraph with the attention that would
be required. Please look what happens at
Please change the summary in order to reflect the new / major problem Thanks in
I don't understand, does this really break dumps?
Also wondering. How to exactly reproduce that it "breaks dumps"?
If the XML schema indicates data is not white space preserving then white space is not significant and there is no difference between " ", " ", " ", "\t\n\n\n\t\t\t\t\t\t\t\t\t \n\n]n" etc.
If a user name exists where white space is significant it becomes impossible to transmit using a non-space preserving data type. Thus it's not actually possible to get the user names correctly and this is rather broken.