Last modified: 2011-03-13 18:05:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 13016 - Anchor encoding is not one-to-one
Anchor encoding is not one-to-one
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
All All
: Lowest enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
Blocks: html
  Show dependency treegraph
Reported: 2008-02-14 02:12 UTC by Aryeh Gregor (not reading bugmail, please e-mail directly)
Modified: 2011-03-13 18:05 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-02-14 02:12:58 UTC
Due to the way we do anchor encoding, there's no way to reliably reverse it.  ".3F", for instance, is translated to ".3F", but so is "?".  And both "_" and " " become "_".  It would be nice if anchor encoding were made reversible to avoid unintended conflicts and permit anchor decoding facilities.

Currently we do, roughly

$id = str_replace( ' ', '_', $id );
$id = Sanitizer::decodeCharReferences( $id );
$id = urlencode( $id );
$id = str_replace( '%3A', ':', $id );
$id = str_replace( '%', '.', $id );

This should be

$id = Sanitizer::decodeCharReferences( $id );
$id = urlencode( $id );
$id = str_replace( '_', '%5F', $id );
$id = str_replace( '.', '%2E', $id );
$id = str_replace( '%20', '_', $id );
$id = str_replace( '%3A', ':', $id );
$id = str_replace( '%', '.', $id );

That could then be reversed reliably (to within entity encoding) with

$id = str_replace( '.', '%', $id );
$id = str_replace( '_', ' ', $id );
$id = urldecode( $id );
Comment 1 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-01-05 16:04:52 UTC
The new encoding scheme we use is deliberately not one-to-one, so that the anchors look nicer: invalid characters (mostly punctuation) are converted into underscores for prettiness.  WONTFIX.

Note You need to log in before you can comment on or make changes to this bug.