Last modified: 2012-03-21 20:42:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 12500 - [[MediaWiki:Antispoof-name-illegal]] should provide information about the illegal and / or blocked characters in / at the wiki
[[MediaWiki:Antispoof-name-illegal]] should provide information about the ill...
Product: MediaWiki extensions
Classification: Unclassified
AntiSpoof (Other open bugs)
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: patch, patch-need-review
Depends on:
  Show dependency treegraph
Reported: 2008-01-04 08:13 UTC by lɛʁi לערי ריינהארט
Modified: 2012-03-21 20:42 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---

Patch for better messages. (7.21 KB, patch)
2011-09-28 22:24 UTC, Van de Bugger
Next attempt. (6.26 KB, patch)
2011-12-13 20:21 UTC, Van de Bugger
Next attempt. (6.81 KB, patch)
2012-03-05 19:07 UTC, Van de Bugger

Description lɛʁi לערי ריינהארט 2008-01-04 08:13:12 UTC
Dear friends;

[[MediaWiki:Antispoof-name-illegal]] contains:

  The name "$1" is not allowed to prevent confusing or spoofed usernames: $2. Please choose another name.


The [[MediaWiki:Antispoof-name-illegal]] message should provide information about illegal and / or blocked characters.

Beside the characters from the Basic Latin block as controll characters in the range of � to  as well as the « % » % , « & » & , « + » &#2B; ¿etc.? characters this should include also all other UTF-8 characters as those from the General Punctuation block, the various whitespace characters etc.

Thanks for implementing this in advanced.

Best regards Reinhardt [[user:Gangleri]]
Comment 1 lɛʁi לערי ריינהארט 2008-01-04 23:05:43 UTC
(In reply to comment #0)
> Beside the characters from ...
> ... this should include also all other UTF-8 characters
> as those from the General Punctuation block, the various whitespace characters
> etc.

which are disallowed in all namespaces

an additional example is « � » which is
Comment 2 Van de Bugger 2011-09-28 22:24:38 UTC
Created attachment 9122 [details]
Patch for better messages.

If username contains a bad (blacklisted, deprecated, unassigned) character, the error message includes the abusing character and its Unicode notation.
Comment 3 Sumana Harihareswara 2011-09-30 16:00:41 UTC
Added the "patch" and "need-review" keywords; Mark hopes to get someone to review the patch soon.
Comment 4 Van de Bugger 2011-10-21 19:53:04 UTC
Comment 5 Platonides 2011-11-22 20:53:20 UTC
Looks good. Although rendering some codepoints could give layout problems (eg. a RTL mark).
Comment 6 Van de Bugger 2011-11-23 18:14:54 UTC
Non-printable characters can be handled in badCharErr:

> private static function badCharErr( $msgId, $point ) {
>     $char = codepointToUtf8( $point );
>     $code = sprintf( 'U+%04X', $point );
>     if ( preg_match( '/^[^:print:]/', $char ) ) {
>         $char = '';
>     }
>     return array( "ERROR", wfMsg( $msgId, $char, $code ) );
> }


> private static function badCharErr( $msgId, $point ) {
>     $char = codepointToUtf8( $point );
>     $code = sprintf( 'U+%04X', $point );
>     if ( preg_match( '/^[^:print:]/', $char ) ) {
>         return array( "ERROR", wfMsg( $msgId . '-np', $code ) );
>     }
>     return array( "ERROR", wfMsg( $msgId, $char, $code ) );
> }

(The latter variant will require some more error messages.)

Which variant do you prefer? I will prepare a new patch.
Comment 7 Van de Bugger 2011-12-12 21:44:49 UTC
Comment 8 Platonides 2011-12-12 22:06:39 UTC
Sorry, I hadn't seen it.
Instead of outputting empty backets (option 1) or having to duplicate all messages (option 2), I would use a single $1 in the messages:
 'antispoof-unassigned'    => 'Contains unassigned character $1',
and then have that $1 replaced by a couple of messages
one expanding to "$1" ($2)  if it's printable or another which is just the U+XX form,
Comment 9 Van de Bugger 2011-12-13 20:21:14 UTC
Created attachment 9674 [details]
Next attempt.

> 'antispoof-unassigned'    => 'Contains unassigned character $1',

That's is not very good decision, because in different languages it may be written in different ways. For example, 

> "$1" ($2) 

(where $1 denotes character itself, $2 -- it's Unicode code) looks suitable for English language.

> «$1» ($2)

would be preferred in Russian. Have no idea about languages with right-to-left direction… May be

> ($2) "$1" 


So I implemented another approach. First, I format character representation from character (if it is printable) and character code:

> if ( mb_ereg( '^[[:print:]]', $symbol ) ) {
>     $char = wfMsg( 'antispoof-bad-char', $symbol, $code );
> } else {
>     $char = wfMsg( 'antispoof-bad-char-np', $code );
> }

So it can be localized for different quotation marks, parenthesis, order, etc. Then, character representation is used to get final message:

> return array( "ERROR", wfMsg( $msgId, $char ) );

It seems it works.
Comment 10 Van de Bugger 2012-03-01 20:20:11 UTC
Comment 11 Van de Bugger 2012-03-01 20:35:23 UTC
Hi, why do you think it is low priority?? It is usability issue. A user creating an account may be very frustrated when you just say "your login name contains unacceptable characters" but do not specify which exactly characters are unacceptable. Please consider it seriously. 

I prepared the patch, the patch was not rejected — you said "looks good", but the patch was not applied. Now, 3 months later, the patch is outdated and cannot be applied cleanly. It require efforts (again!) to revive the patch. Why do you waste your and my time?
Comment 12 Platonides 2012-03-03 23:51:34 UTC
Is mb_ereg the only way to detect printability? Isn't that deprecated? (I don't see it explicitely  marked as such, but I think it would be, as member of the ereg family). I think you can probably detect non-printable characters in PCRE with '/\p{C}/u'

Also, I'd call the message «antispoof-bad-char-non-printable» instead of the more cryptic «antispoof-bad-char-np»

I'm sorry you feel like that, you are obviously right in that this isn't a proper way to treat contributions. Note that I'm not payed for reviewing bugs (or doing any MW development), I just got more or less asssigned to it because I was the only one that took a look at it (and funnily, always after a ping comment). What can I say? I probably missed that it was replying me. Feel free to drop me a line reminder in a direct mail in the future. It's much less likely to get unnoticed as bugzilla mail.
I'd like to put up to date your patch myself, but I'm quite busy in real life now, and I don't think I could get time to dedicate for it.
Comment 13 Van de Bugger 2012-03-05 19:07:34 UTC
Created attachment 10176 [details]
Next attempt.

Hi, I am not payed for working on MW too, and also have some real life. I do not ask for any reward, but just want to be sure my (and your) time is not wasted. I do not ask for immediate response (I am completely ok to wait for a week or so), but longer periods are not so good — patches become outdated, I am loosing context and getting back to details requires more to time so on. So let us finish this tiny work reasonable quickly, ok?

Problems you mentioned last time:

1. mb_ereg — replaced with preg_match( '/\A\p{C}\z/u', $symbol ).
2. Cryptic message key — fixed.


1. Added message documentation ("qqq" pseudo-language).
2. Russian translation dropped, as requested here:
Comment 14 Van de Bugger 2012-03-12 19:13:20 UTC
Comment 15 Van de Bugger 2012-03-19 17:17:23 UTC
Ok, you do not have time to review the patch. Let me see if you have time to roll it back...

Committed r114156.
Comment 16 Platonides 2012-03-20 21:23:30 UTC
> Let me see if you have time to roll it back...

That's not too kind of you.

The patch doesn't look bad, but I'll let someone else review it.
I'm happy to strike that mail from my inbox, though.
Comment 17 Van de Bugger 2012-03-21 20:36:07 UTC
> That's not too kind of you.

I am really sorry.

However, recently I have committed couple of my own extensions, and found there is a plenty of people reviewing my code and criticized:

1. License (AGPLv3),
2. Semicolon after {} block (it is not required),
3. My comment at the end of file (it is useless), 
4. "Useless" declarations of global variables (from their point of view),

and other similar "defects" in my code. Some guys even fixed the code, and one fix really breaks it.

At the same time nobody has time to review this patch!

I am disappointed. I am thinking (1) there are other sites for hosting free projects, where the code will be under my control without annoying semicolon style maintainers, (2) forking AntiSpoof is not a big deal. 

Sorry again.
Comment 18 Sumana Harihareswara 2012-03-21 20:42:02 UTC
Van de Bugger:

I'm sorry for the wait.  However, you know that in the long run it benefits everyone to work together in the same branch even though it can be frustrating in the short term; please don't fork.

Also, I encourage you to get a Git/Gerrit account so that your code will be reviewed faster in the future.  There are about 200 patches awaiting review right now in Bugzilla and moving more of those to the unified code review dashboard will help get reviews faster.

Note You need to log in before you can comment on or make changes to this bug.