Last modified: 2011-03-13 18:05:39 UTC
Due to the way we do anchor encoding, there's no way to reliably reverse it. ".3F", for instance, is translated to ".3F", but so is "?". And both "_" and " " become "_". It would be nice if anchor encoding were made reversible to avoid unintended conflicts and permit anchor decoding facilities. Currently we do, roughly $id = str_replace( ' ', '_', $id ); $id = Sanitizer::decodeCharReferences( $id ); $id = urlencode( $id ); $id = str_replace( '%3A', ':', $id ); $id = str_replace( '%', '.', $id ); This should be $id = Sanitizer::decodeCharReferences( $id ); $id = urlencode( $id ); $id = str_replace( '_', '%5F', $id ); $id = str_replace( '.', '%2E', $id ); $id = str_replace( '%20', '_', $id ); $id = str_replace( '%3A', ':', $id ); $id = str_replace( '%', '.', $id ); That could then be reversed reliably (to within entity encoding) with $id = str_replace( '.', '%', $id ); $id = str_replace( '_', ' ', $id ); $id = urldecode( $id );
The new encoding scheme we use is deliberately not one-to-one, so that the anchors look nicer: invalid characters (mostly punctuation) are converted into underscores for prettiness. WONTFIX.