Last modified: 2011-11-30 16:42:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T10876, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 8876 - Non-ASCII characters should be unescaped in fullurl, as in other URLs
Non-ASCII characters should be unescaped in fullurl, as in other URLs
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
All All
: Lowest minor (vote)
: ---
Assigned To: Nobody - You can work on this!
: utf8
Depends on:
  Show dependency treegraph
Reported: 2007-02-04 16:50 UTC by Dan Jacobson
Modified: 2011-11-30 16:42 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Dan Jacobson 2007-02-04 16:50:12 UTC
Normally UTF-8 in URLs looks good with printable=yes, e.g.,

But not when {{fullurl}} is involved, e.g.,

There they are printed as % escapes instead of UTF-8.

(Why I use {{fullurl}} is to discourage editing categories as I
discussed elsewhere.)
Comment 1 Brion Vibber 2007-02-04 17:06:22 UTC
Please describe "with printable=yes" and "when {{fullurl}} is involved".
Comment 2 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-02-04 17:37:53 UTC
The issue appears to be that a link like免

has "免" printed in the source href attribute as a Chinese character, but a link like


has 免 mangled to the escaped form, "%E5%85%8D".  (The printable display aspect is just a symptom.)  
I've confirmed this is true in trunk.
Comment 3 Rob Church 2007-02-04 17:45:14 UTC
Er, there's a reason we escape these things.

The reason we *don't* do it in the printable form of pages is because it's
usually safe enough for the user to type the proper character as a URL directly.
It's also a damn sight prettier.

In all likelihood, the reason it doesn't happen with {{fullurl}} et al. is
because those operations are run before whatever code it is that un-escapes
certain URL components.
Comment 4 Brion Vibber 2007-02-04 17:48:04 UTC
This is compatible URL/URI encoding of a UTF-8 IRI.

Some day when everyone's using fully IRI-compatible browsers,
we may make all URLs display in pretty UTF-8 (but keep in mind
that can make many URLs impossible to type).
Comment 5 Schnargel 2007-02-04 22:21:48 UTC
I understand this is just about the readability of the generated output, so typing URLs isn't at all what this is about.

What is sometimes hard to communicate to people whose language makes only use of 7-bit ASCII characters is, that people whose language 
uses an extended set of characters are very well capable of entering 免, Wikipédia, or Füße, and that 免, Wikipédia, or Füße is way more 
readable than %E5%85%8D, Wikip%C3%A9dia, or F%C3%BC%C3%9Fe. Since these sinister characters work fine in normal links there is no 
technical limitation why fullurl would need to return these characters escaped. And for browser usage, IRI or not, there is fullurle with properly 
escaped characters if I remember the docs correctly.
Comment 6 Diederik van Liere 2011-11-30 16:42:47 UTC
I just tested this, and it seems to me that this issue has been FIXED.

Note You need to log in before you can comment on or make changes to this bug.