Last modified: 2010-05-15 15:37:24 UTC
usernames should be restricted to a whitelist of characters which includes only
valid alphanumeric characters in each language, and punctuation. otherwise,
creating usernames (and page titles) with invalid characters will make it hard
to block vandals.
*Invalid* characters (those that are illegal in XML or don't reliably cut and paste) need to be outright
blocked in titles.
Characters that simply some people are unable to type should not be a real problem as either there
should be a direct 'block' link, or cut-and-paste will always be available.
I'm not really inclined to proclaim what characters are appropriate for each language, as this will make
interoperability, writing on foreign topics, shared data, shared user accounts, global user accounts etc
very hard and will require a lot of manual mucking about as people whine for whitelists to be updated.
Agreed. There are to my knowledge no legit users on the English wiki that use
non-ASCII characters in their name, but it's a favorite trick of vandals and
*** Bug 2290 has been marked as a duplicate of this bug. ***
(In reply to comment #0)
> usernames should be restricted to a whitelist of characters which includes only
> valid alphanumeric characters in each language, and punctuation.
This requirement and single user login will conflict with the wish to use
*natives* (non latin) alphabets in user names.
(In reply to comment #4)
> (In reply to comment #0)
> > usernames should be restricted to a whitelist of characters which includes only
> > valid alphanumeric characters in each language, and punctuation.
> This requirement and single user login will conflict with the wish to use
> *natives* (non latin) alphabets in user names.
Usernames shouldn't be stored in a normalised form, however, users should not be
permitted to register names which would conflict with existing usernames, when
Perhaps this could be achieved by adding a new field to the user table -
'username_normal' - and storing the normalised username there. Add a unique
constraint to the field, and then attempts to register a username which will
result in a collision when normalised will... well, result in a database error.
Now the question is, where do we get a reasonable map of confusable characters.
particularly extensive, but should work for most malicious cases. Perhaps we
should try to get a copy of the IDN normalisation map. The Unicode Consortium
has a long document about visual spoofing:
(In reply to comment #5)
There are many opinions about the restriction of usernames:
"Since this is the English Wikipedia, usernames ought to be constructed using
English characters, with allowances for scripts from other languages ..." from
Nevertheless the communitys decision about this should be more tolerant. With
regard to single user login it should be allowed to use Arabic, Cyrilic, Hebrew,
Hindu, Georgian whatsoever alphabets.
I would not object to usernames as [[user:۞]], [[user:░]], [[User:–]] etc. The
usernames are part of personality and creativity. Whatever opinion we have on
this / how we deal with this it is *reality* that there are also usernames like
[[en:user:god]] - see [[en:user talk:god]], [[en:user:satan]],
some examples related to
bug 337: inconsistent treatment of character entities and illegal chararcters in
is a "construct" based on
bug 2173: Fatal error when removing an article with an whitespace title from the
bug 3696: Unicode Control Characters should be restricted in title text
bug 2593: Non-printing characters allowed in registration
(In reply to comment #6)
> Usernames shouldn't be stored in a normalised form, however, users should not be
> permitted to register names which would conflict with existing usernames, when
Depending on the used font two "ו" characters can look like one "װ" character:
[[yi:user:גאַװיאַל]] and [[yi:user:גאַוויאַל]]
Hmm, you could say similar things about vv and w (though generally w is
bug 3982: Maybe...
*** Bug 4312 has been marked as a duplicate of this bug. ***
Is this FIXED already?
I could create a user page
but I could not create such an *account*.
User.php,1.212,1.213 by Brion
"Blocking some Unicode whitespace characters in usernames. Should check if some
or all should be blocked from all page titles."
A block list is equivalent to a whitelist.
It might a good idea to give a feedback why the user name used during create new
user is invalid / show what Unicode character is used.
For "transparency" of wiki configuration the list of blocked characters should
best regards reinhardt [[user:gangleri]]
Blocking != Whitelisting
The list of blocked characters is available if you look at the code and also the
relevant commit message in the mediawiki-cvs archives.
Here's a good way of filtering names:
1) first, do Nameprep
2) only allow the use of characters specific to one particular writing system in
the resulting string, and a few carefully selected non-alphabetic characters
(such as space, apostrophe, and any others you want to add to the whitelist).
This is being used in IDN at the moment, and it's very successful at preventing
a very wide variety of potential abuses, such as mixed-script spoofing and the
use of exotic Unicode characters to break rendering engines.
I happen to have some nice compact table-driven C code for doing this: mail me
if you want it.
We should file the within-script character spoofing problem as a separate bug:
as stated above, this is easily dealt with by storing a normalized form of each
name alongside the real name, and checking that no normalized form is ever
duplicated: given this, the only problem is working out the ruleset for
normalizing these strings.
*** Bug 7463 has been marked as a duplicate of this bug. ***
I emailed Neil and he told me that there is a MediaWiki extention out to block unicode in usernames.
Can anyone confirm this or deny it?
We will never "block out Unicode" as that doesn't make sense.
*Every* username is Unicode, with *no exceptions*.
What we will do is enforce restrictions on some characters
and mixed-script names. Please see the code in AntiSpoof extension.
I download the files and AntiSpoof has no docs or explanations not findable on mediawiki, google, or
in the code. I had to read through the code of the six files to determine which one to include.
First, is AntiSpoof still in testing and not working correctly yet?
Also, is patch-antispoof.sql.txt needed or is some SQL work needed to be done first before using
And for its log file is that something saved like debug.log, something only in the MySQL, or
something viewable in mediawiki itself?
This bug entry is not a discussion forum. If you want to ask general
questions about how to operate software, please do it separately.
Done reasonably with AntiSpoof