Last modified: 2012-04-16 09:15:43 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T30460, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 28460 - invisible character in url leads to "no article" page
invisible character in url leads to "no article" page
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
Redirects (Other open bugs)
unspecified
All All
: Lowest normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-04-07 20:27 UTC by fx12345
Modified: 2012-04-16 09:15 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
example of invisible character in wikipedia url (44.75 KB, image/jpeg)
2011-04-07 20:27 UTC, fx12345
Details

Description fx12345 2011-04-07 20:27:47 UTC
Created attachment 8387 [details]
example of invisible character in wikipedia url

An example:
http://en.wikipedia.org/wiki/Horsesho%E2%80%8Be_orbit

If this link is pasted into Wikipedia, it fails. The character which is decoding here as "E2 80 8B" is normally invisible. It appears to be the UTF-8 character "zero width space". Since it has no syntactical value, shouldn't such characters simply be removed from pasted urls? I don't know how the spurious character got into the URL in the first place, but surely any invisible characters ought to be removed by the parser, right?
Comment 1 Mark A. Hershberger 2011-04-07 21:47:20 UTC
We already make it impossible to create pages with invisible spaces: http://en.wikipedia.org/w/index.php?title=Horsesho%E2%80%8Be_orbit&action=edit

Is there any reason to do more than this?
Comment 2 Bawolff (Brian Wolff) 2011-04-07 22:07:08 UTC
(In reply to comment #1)
> We already make it impossible to create pages with invisible spaces:
> http://en.wikipedia.org/w/index.php?title=Horsesho%E2%80%8Be_orbit&action=edit
> 
> Is there any reason to do more than this?

That is some customization on the Wikipedia side (abuse filter presumably). We allow creating such pages in general. For example: http://test.wikipedia.org/wiki/Horsesho%E2%80%8Be_orbit

"Invisible" characters are sometimes needed for typographical reasons. I'm not sure if that is the case for Zero width space, but certainly is for other zero width characters (ZWNJ, etc)- so we should be careful before excluding any such characters.

From the spec - "Zero width space indicates a word break or line break opportunity, except that it has no width. Zero-width space characters are intended to be used in languages that have no visible word spacing to represent word break or line break opportunities, such as Thai, Myanmar, Khmer, and Japanese."

Thus, such characters might potentially be useful in a very very long title, in a language that doesn't use spaces, to indicate where to put a line break in the title. (Of course I don't speak any such languages, so could be wrong on that).

Anyways I don't think we should disallow it without being very sure its un-needed. (OTOH, we do disallow left-to-right mark characters, which are similar, as well as normalizing things like narrow-non-break space to normal spaces, so we do similar things to what is proposed here)
Comment 3 Bawolff (Brian Wolff) 2011-04-13 00:34:57 UTC
This seems to have been intentionally allowed back in titles in r56918. The commit summary on that revision seems to be a pretty good reason to not strip such characters, so I'm going to go ahead and mark this wontfix.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links