Last modified: 2010-05-15 15:37:24 UTC
usernames should be restricted to a whitelist of characters which includes only valid alphanumeric characters in each language, and punctuation. otherwise, creating usernames (and page titles) with invalid characters will make it hard to block vandals.
*Invalid* characters (those that are illegal in XML or don't reliably cut and paste) need to be outright blocked in titles. Characters that simply some people are unable to type should not be a real problem as either there should be a direct 'block' link, or cut-and-paste will always be available. I'm not really inclined to proclaim what characters are appropriate for each language, as this will make interoperability, writing on foreign topics, shared data, shared user accounts, global user accounts etc very hard and will require a lot of manual mucking about as people whine for whitelists to be updated.
Agreed. There are to my knowledge no legit users on the English wiki that use non-ASCII characters in their name, but it's a favorite trick of vandals and impersonators.
*** Bug 2290 has been marked as a duplicate of this bug. ***
(In reply to comment #0) > usernames should be restricted to a whitelist of characters which includes only > valid alphanumeric characters in each language, and punctuation. This requirement and single user login will conflict with the wish to use *natives* (non latin) alphabets in user names.
(In reply to comment #4) > (In reply to comment #0) > > usernames should be restricted to a whitelist of characters which includes only > > valid alphanumeric characters in each language, and punctuation. > This requirement and single user login will conflict with the wish to use > *natives* (non latin) alphabets in user names. why?
Usernames shouldn't be stored in a normalised form, however, users should not be permitted to register names which would conflict with existing usernames, when normalised. Perhaps this could be achieved by adding a new field to the user table - 'username_normal' - and storing the normalised username there. Add a unique constraint to the field, and then attempts to register a username which will result in a collision when normalised will... well, result in a database error. Now the question is, where do we get a reasonable map of confusable characters. http://www.unicode.org/draft/reports/tr36/Attic/confusables.txt isn't particularly extensive, but should work for most malicious cases. Perhaps we should try to get a copy of the IDN normalisation map. The Unicode Consortium has a long document about visual spoofing: http://www.unicode.org/draft/reports/tr36/tr36.html
(In reply to comment #5) > why? There are many opinions about the restriction of usernames: "Since this is the English Wikipedia, usernames ought to be constructed using English characters, with allowances for scripts from other languages ..." from [[en:Wikipedia_talk:Username#On_Unicode_and_other_odd_characters_in_usernames]] Nevertheless the communitys decision about this should be more tolerant. With regard to single user login it should be allowed to use Arabic, Cyrilic, Hebrew, Hindu, Georgian whatsoever alphabets. I would not object to usernames as [[user:۞]], [[user:░]], [[User:–]] etc. The usernames are part of personality and creativity. Whatever opinion we have on this / how we deal with this it is *reality* that there are also usernames like [[en:user:god]] - see [[en:user talk:god]], [[en:user:satan]], [[en:user:antichrist]] etc.
some examples related to bug 337: inconsistent treatment of character entities and illegal chararcters in titles/links http://en.wikipedia.org/wiki/User:%E2%80%8F http://en.wikipedia.org/wiki/Special:Contributions/%E2%80%8F http://en.wikipedia.org/wiki/User:Gangleri/tests/bugzilla:00337#User:.26rlm.3B
http://en.wikipedia.org/wiki/User:%C2%A0 is a "construct" based on bug 2173: Fatal error when removing an article with an whitespace title from the watchlist
compare with bug 3696: Unicode Control Characters should be restricted in title text
see also bug 2593: Non-printing characters allowed in registration
(In reply to comment #6) > Usernames shouldn't be stored in a normalised form, however, users should not be > permitted to register names which would conflict with existing usernames, when > normalised. Depending on the used font two "ו" characters can look like one "װ" character: [[yi:user:גאַװיאַל]] and [[yi:user:גאַוויאַל]]
Hmm, you could say similar things about vv and w (though generally w is narrower)...
compare with bug 3982: Maybe...
*** Bug 4312 has been marked as a duplicate of this bug. ***
Is this FIXED already? I could create a user page http://test.wikipedia.org/wiki/User:%E2%80%AEresu_ladnav_%E2%80%AD%E2%80%AC but I could not create such an *account*. Please see http://mail.wikipedia.org/pipermail/mediawiki-cvs/2006-February/013973.html User.php,1.212,1.213 by Brion "Blocking some Unicode whitespace characters in usernames. Should check if some or all should be blocked from all page titles." A block list is equivalent to a whitelist. It might a good idea to give a feedback why the user name used during create new user is invalid / show what Unicode character is used. For "transparency" of wiki configuration the list of blocked characters should be displayed. best regards reinhardt [[user:gangleri]]
(sigh) Blocking != Whitelisting The list of blocked characters is available if you look at the code and also the relevant commit message in the mediawiki-cvs archives.
Here's a good way of filtering names: 1) first, do Nameprep 2) only allow the use of characters specific to one particular writing system in the resulting string, and a few carefully selected non-alphabetic characters (such as space, apostrophe, and any others you want to add to the whitelist). This is being used in IDN at the moment, and it's very successful at preventing a very wide variety of potential abuses, such as mixed-script spoofing and the use of exotic Unicode characters to break rendering engines. I happen to have some nice compact table-driven C code for doing this: mail me if you want it. We should file the within-script character spoofing problem as a separate bug: as stated above, this is easily dealt with by storing a normalized form of each name alongside the real name, and checking that no normalized form is ever duplicated: given this, the only problem is working out the ruleset for normalizing these strings.
*** Bug 7463 has been marked as a duplicate of this bug. ***
I emailed Neil and he told me that there is a MediaWiki extention out to block unicode in usernames. Can anyone confirm this or deny it?
We will never "block out Unicode" as that doesn't make sense. *Every* username is Unicode, with *no exceptions*. What we will do is enforce restrictions on some characters and mixed-script names. Please see the code in AntiSpoof extension.
I download the files and AntiSpoof has no docs or explanations not findable on mediawiki, google, or in the code. I had to read through the code of the six files to determine which one to include. First, is AntiSpoof still in testing and not working correctly yet? Also, is patch-antispoof.sql.txt needed or is some SQL work needed to be done first before using AntiSpoof? And for its log file is that something saved like debug.log, something only in the MySQL, or something viewable in mediawiki itself?
This bug entry is not a discussion forum. If you want to ask general questions about how to operate software, please do it separately.
Done reasonably with AntiSpoof