Last modified: 2012-03-21 20:42:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T14500, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 12500 - [[MediaWiki:Antispoof-name-illegal]] should provide information about the illegal and / or blocked characters in / at the wiki
[[MediaWiki:Antispoof-name-illegal]] should provide information about the ill...
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
AntiSpoof (Other open bugs)
unspecified
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
http://en.wikipedia.org/wiki/MediaWik...
: patch, patch-need-review
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-01-04 08:13 UTC by lɛʁi לערי ריינהארט
Modified: 2012-03-21 20:42 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Patch for better messages. (7.21 KB, patch)
2011-09-28 22:24 UTC, Van de Bugger
Details
Next attempt. (6.26 KB, patch)
2011-12-13 20:21 UTC, Van de Bugger
Details
Next attempt. (6.81 KB, patch)
2012-03-05 19:07 UTC, Van de Bugger
Details

Description lɛʁi לערי ריינהארט 2008-01-04 08:13:12 UTC
Dear friends;

[[MediaWiki:Antispoof-name-illegal]] contains:

  The name "$1" is not allowed to prevent confusing or spoofed usernames: $2. Please choose another name.

request:

The [[MediaWiki:Antispoof-name-illegal]] message should provide information about illegal and / or blocked characters.

Beside the characters from the Basic Latin block as controll characters in the range of � to  as well as the « % » % , « & » & , « + » &#2B; ¿etc.? characters this should include also all other UTF-8 characters as those from the General Punctuation block, the various whitespace characters etc.

Thanks for implementing this in advanced.

Best regards Reinhardt [[user:Gangleri]]
Comment 1 lɛʁi לערי ריינהארט 2008-01-04 23:05:43 UTC
(In reply to comment #0)
...
> Beside the characters from ...
> ... this should include also all other UTF-8 characters
> as those from the General Punctuation block, the various whitespace characters
> etc.

which are disallowed in all namespaces

an additional example is « � » which is
Unicode Character 'REPLACEMENT CHARACTER' - U+FFFD
http://www.fileformat.info/info/unicode/char/fffd/index.htm
Comment 2 Van de Bugger 2011-09-28 22:24:38 UTC
Created attachment 9122 [details]
Patch for better messages.

If username contains a bad (blacklisted, deprecated, unassigned) character, the error message includes the abusing character and its Unicode notation.
Comment 3 Sumana Harihareswara 2011-09-30 16:00:41 UTC
Added the "patch" and "need-review" keywords; Mark hopes to get someone to review the patch soon.
Comment 4 Van de Bugger 2011-10-21 19:53:04 UTC
Ping.
Comment 5 Platonides 2011-11-22 20:53:20 UTC
Looks good. Although rendering some codepoints could give layout problems (eg. a RTL mark).
Comment 6 Van de Bugger 2011-11-23 18:14:54 UTC
Non-printable characters can be handled in badCharErr:

> private static function badCharErr( $msgId, $point ) {
>     $char = codepointToUtf8( $point );
>     $code = sprintf( 'U+%04X', $point );
>     if ( preg_match( '/^[^:print:]/', $char ) ) {
>         $char = '';
>     }
>     return array( "ERROR", wfMsg( $msgId, $char, $code ) );
> }

Or:

> private static function badCharErr( $msgId, $point ) {
>     $char = codepointToUtf8( $point );
>     $code = sprintf( 'U+%04X', $point );
>     if ( preg_match( '/^[^:print:]/', $char ) ) {
>         return array( "ERROR", wfMsg( $msgId . '-np', $code ) );
>     }
>     return array( "ERROR", wfMsg( $msgId, $char, $code ) );
> }

(The latter variant will require some more error messages.)

Which variant do you prefer? I will prepare a new patch.
Comment 7 Van de Bugger 2011-12-12 21:44:49 UTC
Ping.
Comment 8 Platonides 2011-12-12 22:06:39 UTC
Sorry, I hadn't seen it.
Instead of outputting empty backets (option 1) or having to duplicate all messages (option 2), I would use a single $1 in the messages:
 'antispoof-unassigned'    => 'Contains unassigned character $1',
and then have that $1 replaced by a couple of messages
one expanding to "$1" ($2)  if it's printable or another which is just the U+XX form,
Comment 9 Van de Bugger 2011-12-13 20:21:14 UTC
Created attachment 9674 [details]
Next attempt.

> 'antispoof-unassigned'    => 'Contains unassigned character $1',

That's is not very good decision, because in different languages it may be written in different ways. For example, 

> "$1" ($2) 

(where $1 denotes character itself, $2 -- it's Unicode code) looks suitable for English language.

> «$1» ($2)

would be preferred in Russian. Have no idea about languages with right-to-left direction… May be

> ($2) "$1" 

?

So I implemented another approach. First, I format character representation from character (if it is printable) and character code:

> if ( mb_ereg( '^[[:print:]]', $symbol ) ) {
>     $char = wfMsg( 'antispoof-bad-char', $symbol, $code );
> } else {
>     $char = wfMsg( 'antispoof-bad-char-np', $code );
> }

So it can be localized for different quotation marks, parenthesis, order, etc. Then, character representation is used to get final message:

> return array( "ERROR", wfMsg( $msgId, $char ) );

It seems it works.
Comment 10 Van de Bugger 2012-03-01 20:20:11 UTC
Ping.
Comment 11 Van de Bugger 2012-03-01 20:35:23 UTC
Hi, why do you think it is low priority?? It is usability issue. A user creating an account may be very frustrated when you just say "your login name contains unacceptable characters" but do not specify which exactly characters are unacceptable. Please consider it seriously. 

I prepared the patch, the patch was not rejected — you said "looks good", but the patch was not applied. Now, 3 months later, the patch is outdated and cannot be applied cleanly. It require efforts (again!) to revive the patch. Why do you waste your and my time?
Comment 12 Platonides 2012-03-03 23:51:34 UTC
Is mb_ereg the only way to detect printability? Isn't that deprecated? (I don't see it explicitely  marked as such, but I think it would be, as member of the ereg family). I think you can probably detect non-printable characters in PCRE with '/\p{C}/u'

Also, I'd call the message «antispoof-bad-char-non-printable» instead of the more cryptic «antispoof-bad-char-np»

I'm sorry you feel like that, you are obviously right in that this isn't a proper way to treat contributions. Note that I'm not payed for reviewing bugs (or doing any MW development), I just got more or less asssigned to it because I was the only one that took a look at it (and funnily, always after a ping comment). What can I say? I probably missed that it was replying me. Feel free to drop me a line reminder in a direct mail in the future. It's much less likely to get unnoticed as bugzilla mail.
I'd like to put up to date your patch myself, but I'm quite busy in real life now, and I don't think I could get time to dedicate for it.
Comment 13 Van de Bugger 2012-03-05 19:07:34 UTC
Created attachment 10176 [details]
Next attempt.

Hi, I am not payed for working on MW too, and also have some real life. I do not ask for any reward, but just want to be sure my (and your) time is not wasted. I do not ask for immediate response (I am completely ok to wait for a week or so), but longer periods are not so good — patches become outdated, I am loosing context and getting back to details requires more to time so on. So let us finish this tiny work reasonable quickly, ok?

Problems you mentioned last time:

1. mb_ereg — replaced with preg_match( '/\A\p{C}\z/u', $symbol ).
2. Cryptic message key — fixed.

Additionally:

1. Added message documentation ("qqq" pseudo-language).
2. Russian translation dropped, as requested here: http://www.mediawiki.org/wiki/Localization#Changing_existing_messages
Comment 14 Van de Bugger 2012-03-12 19:13:20 UTC
Ping.
Comment 15 Van de Bugger 2012-03-19 17:17:23 UTC
Ok, you do not have time to review the patch. Let me see if you have time to roll it back...

Committed r114156.
Comment 16 Platonides 2012-03-20 21:23:30 UTC
> Let me see if you have time to roll it back...

That's not too kind of you.

The patch doesn't look bad, but I'll let someone else review it.
I'm happy to strike that mail from my inbox, though.
Comment 17 Van de Bugger 2012-03-21 20:36:07 UTC
> That's not too kind of you.

I am really sorry.

However, recently I have committed couple of my own extensions, and found there is a plenty of people reviewing my code and criticized:

1. License (AGPLv3),
2. Semicolon after {} block (it is not required),
3. My comment at the end of file (it is useless), 
4. "Useless" declarations of global variables (from their point of view),

and other similar "defects" in my code. Some guys even fixed the code, and one fix really breaks it.

At the same time nobody has time to review this patch!

I am disappointed. I am thinking (1) there are other sites for hosting free projects, where the code will be under my control without annoying semicolon style maintainers, (2) forking AntiSpoof is not a big deal. 

Sorry again.
Comment 18 Sumana Harihareswara 2012-03-21 20:42:02 UTC
Van de Bugger:

I'm sorry for the wait.  However, you know that in the long run it benefits everyone to work together in the same branch even though it can be frustrating in the short term; please don't fork.

Also, I encourage you to get a Git/Gerrit account https://labsconsole.wikimedia.org/wiki/Help:Access#Access_FAQ so that your code will be reviewed faster in the future.  There are about 200 patches awaiting review right now in Bugzilla and moving more of those to the unified code review dashboard will help get reviews faster.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links