Last modified: 2011-11-30 16:42:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 8876 - Non-ASCII characters should be unescaped in fullurl, as in other URLs
Non-ASCII characters should be unescaped in fullurl, as in other URLs
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
All All
: Lowest minor (vote)
: ---
Assigned To: Nobody - You can work on this!
: utf8
Depends on:
  Show dependency treegraph
Reported: 2007-02-04 16:50 UTC by Dan Jacobson
Modified: 2011-11-30 16:42 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Dan Jacobson 2007-02-04 16:50:12 UTC
Normally UTF-8 in URLs looks good with printable=yes, e.g.,

But not when {{fullurl}} is involved, e.g.,

There they are printed as % escapes instead of UTF-8.

(Why I use {{fullurl}} is to discourage editing categories as I
discussed elsewhere.)
Comment 1 Brion Vibber 2007-02-04 17:06:22 UTC
Please describe "with printable=yes" and "when {{fullurl}} is involved".
Comment 2 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-02-04 17:37:53 UTC
The issue appears to be that a link like免

has "免" printed in the source href attribute as a Chinese character, but a link like


has 免 mangled to the escaped form, "%E5%85%8D".  (The printable display aspect is just a symptom.)  
I've confirmed this is true in trunk.
Comment 3 Rob Church 2007-02-04 17:45:14 UTC
Er, there's a reason we escape these things.

The reason we *don't* do it in the printable form of pages is because it's
usually safe enough for the user to type the proper character as a URL directly.
It's also a damn sight prettier.

In all likelihood, the reason it doesn't happen with {{fullurl}} et al. is
because those operations are run before whatever code it is that un-escapes
certain URL components.
Comment 4 Brion Vibber 2007-02-04 17:48:04 UTC
This is compatible URL/URI encoding of a UTF-8 IRI.

Some day when everyone's using fully IRI-compatible browsers,
we may make all URLs display in pretty UTF-8 (but keep in mind
that can make many URLs impossible to type).
Comment 5 Schnargel 2007-02-04 22:21:48 UTC
I understand this is just about the readability of the generated output, so typing URLs isn't at all what this is about.

What is sometimes hard to communicate to people whose language makes only use of 7-bit ASCII characters is, that people whose language 
uses an extended set of characters are very well capable of entering 免, Wikipédia, or Füße, and that 免, Wikipédia, or Füße is way more 
readable than %E5%85%8D, Wikip%C3%A9dia, or F%C3%BC%C3%9Fe. Since these sinister characters work fine in normal links there is no 
technical limitation why fullurl would need to return these characters escaped. And for browser usage, IRI or not, there is fullurle with properly 
escaped characters if I remember the docs correctly.
Comment 6 Diederik van Liere 2011-11-30 16:42:47 UTC
I just tested this, and it seems to me that this issue has been FIXED.

Note You need to log in before you can comment on or make changes to this bug.