Last modified: 2008-07-12 20:52:39 UTC
Hallo! This request proposes a synthesis solution for different bugs: a) Bug 1414: Unicode whitespaces allowed in article title b) Bug 1524: usernames should use unicode whitelist c) Bug 2593: Non-printing characters allowed in registration d) Bug 3819: strip phantom general punctuation characters from page titles Requests and solutions can be "restrictive" but these would make it impossible to use these characters at all. Personaly I do not like restrictive solutions. The solution proposed here is to implement a notification for "action=submit" (preview or save) indicating that saving would generate "irregular links", links containing "irregular characters". The notification should list *all* "irregular links" individualy (what would be an irregular link should be defined in a .php include file) and a "save anyway" buttom. *notifications* are not new in MediaWiki: - Special:Upload notifies if the size of a file to be uploaded is above a limit. - Special:Upload notifies if a file would be uploaded with a title that is already existing. Both notifications are using [[MediaWiki:Uploadwarning]] button: [[MediaWiki:Savefile]] text: [[MediaWiki:Ignorewarning]] etc. The proposed solution would meat the main goal: - generating a warning if somthing could happen what makes trouble - if the generation is intended then it is up to the user to generate the link Benefit: The warning should prevent from generating "unintended" "irregular links". The list of the "irregular links" should display the "irregular characters" as HTML entities if such exist else in &#nnnn; notation and *not* as UTF-8 because it would not be possible to see / distinguish many of them as UTF-8. *main* "irregular characters" identified until now: - whitespace / non-printing characters - general punctuation characters The notification should support all types of codings of the "irregular characters": UTF-8, HTML entities (‎ rlm; ...) &#nnnn;, &#xnnnn; %XX%YY%ZZ in links or their parameters (also inside {{localurl}}, {{fullurl}} ...). The proposed solution would make it easy to identify such forms of vandalism or mistakes caused by copy and paste or incorrect editing due to insertion / deletion of such characters. Detecting and fixing them now is very time consuming. ---- *other* "irregular characters" It should be evaluated if this function can be used for "Unicode character normalisation" also. This is dealing with MediaWiki's conversion of Unicode precomposed characters to a group of Unicode characters. An optimal achievement would be to generate "proposals" "what to replace with what" offering checkboxes beside the links. Example: A Unicode Character HEBREW LETTER ALEF WITH PATAH - U FB2E would be replaced anyway by MediaWiki with the two characters HEBREW LETTER ALEF - U+05D0 and HEBREW POINT PATAH - U+05B7. So if we change the characters in the build in title normalisation why not being able to change also - the &#nnnn; representation אַ to אַ - the &#xnnnn; representation אַ to אַ - the %EF%AC%AE to %D7%90%D6%B7 in the source of the page? It makes only trouble to keep these. See Bug 3860: links generated with precombined characters show red despite the fact that the normalised links exist testcase: [[wiktionary:yi:bugzilla/03860]] Because changes would be controled by checkboxes it would still be possible to maintain precombined characters for documentation, testing ... However fixing / "converting to the standard" would be achieved with a "build in help" "knowledge tool" and can save much time. some bugs dealing with Unicode normalization: - Bug 1375: Unicode normalization leaves red links - Bug 1527: problem on URL with Devanagari characters - Bug 2399: Unicode normalization interferes with Hebrew and Arabic with vowels Best regards reinhardt [[user:gangleri]]
(In reply to comment #0) > An optimal achievement would be to generate "proposals" "what to replace with what" offering checkboxes beside the links. This handles "character conversion". adding blocks Bug 3985: character conversion (tracking)
*note* This request handles only the occurence of "irregular characters" in links. For the handling in the rest of the page source see Bug 4012: feature request: add a felexible magic character conversion to the build in editor
*note* Because this request is related to action=submit it should also make an analysis of {{PAGENAME}}. This will prevent creating such pages and avert editors about the problem. However this request does specify to make an analysis of {{PAGENAME}} for other actions as view, watch, history, move, delete, validate etc.
Problem characters would simply be forbidden. "Notification" is unnecessary.
REOPENing this bug and changing title to feature request: provide a notification for irregular Unicode characters Dear friends; http://test.wikipedia.org/wiki/Bugzilla_003696 describes how persistend and irritating *invisible* Unicode characters (as the General Unicode Punctuation characters) can be. As a documentation text was copied and pasted from the page http://aleph1.libnet.ac.il/F/?func=find-b&find_code=WSB&request=9657318130 General Unicode Punctuation characters *infected* 1. http://test.wikipedia.org/w/index.php?diff=prev&oldid=43229 2. http://test.wikipedia.org/w/index.php?diff=prev&oldid=43230 3. http://test.wikipedia.org/w/index.php?diff=prev&oldid=43231 and whatever other pages, emails etc. which used these pages as a source. [[user:Splarka]] made http://test.wikipedia.org/wiki/MediaWiki:Gadget-EvilUnicodeConverter which is available for tests at http://test.wikipedia.org/wiki/Special:Gadgets With this tool it is possible to identify a configurable set of "''Evil Unicode characters''". The source of the page content is displayed as # Author Title Year Library Sysno 1 ‫ לנסקי, אהרן,1955- ‬ ‫ נגד כיוון ההיסטוריה :הרפתקאותיו המופלאות של האיש שהצ ‬ 2005 HAI Haifa U. 006639172 2 This is a very convenient way to eliminate all unvanted "''Evil Unicode characters''". Please reconsider to include this or similar code as a standard function in MediaWiki. Thanks in advance for all your efforts. Best regards Reinhardt [[user:Gangleri]]
Sounds a job for an extension or a gadget, which already seems to exist.