Last modified: 2011-04-18 21:18:32 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T23027, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 21027 - Requests with utf-8 in the URL return a outdated page revision
Requests with utf-8 in the URL return a outdated page revision
Status: NEW
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Low normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on: 28602
Blocks:
  Show dependency treegraph
 
Reported: 2009-10-06 19:33 UTC by Michael Holzt
Modified: 2011-04-18 21:18 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
pcap file showing the problem (17.95 KB, application/octet-stream)
2009-10-06 19:33 UTC, Michael Holzt
Details

Description Michael Holzt 2009-10-06 19:33:53 UTC
Created attachment 6637 [details]
pcap file showing the problem

I have noticed that if i request the URL http://de.wikinews.org/wiki/Nobelpreis_für_Physik_für_„die_Meister_des_Lichts“ using the text mode browser links, i get a old (outdated) revision of the page.

I have tracked this issue and found that it is caused by links sending the special characters in the URL unencoded, directly as 8bit utf-8, and not in %xy encoding. If i change the URL to use %xy encoding (http://de.wikinews.org/wiki/Nobelpreis_f%C3%BCr_Physik_f%C3%BCr_%E2%80%9Edie_Meister_des_Lichts%E2%80%9C).

However as it seems, mediawiki actually can handle requests with utf-8 in the url, but for some strange reason it returns a old page revision when requesting that way.

I will attach a pcap-trace which shows first a request using links and the a request using lynx (lynx does the %xy encoding). You will notice the different page revisions returned.
Comment 1 Platonides 2009-10-06 21:24:41 UTC
Looks like a problem with the squids cache not being purged for that encoding.
Comment 2 Brion Vibber 2009-10-08 17:09:26 UTC
Mark, do we know whether Squid normalizes percent-encoded chars vs raw chars in URLs when determining canonical URLs for caching?

MediaWiki redirects you to the canonical URL for not-quite-canonical page view URLs in order to ensure consistent caching, but I have the vaguest recollection that our detection is post-percent decoding so we're not necessarily doing that right already.

If Squid would be caching them separately, then we might need to fix that up in MediaWiki to be more aggressive about the redirecting.
Comment 3 Michael Holzt 2009-10-08 17:16:07 UTC
Hmm, i've just noticed that bugzilla seems to have a bug as well as can be seen in my bugreport. The url does not get linked correctly, the trailing “ is missing...
Comment 4 Mark Bergsma 2009-10-08 17:22:50 UTC
I've just had a look at the code, and it seems that Squid does not do canonizing of URLs w.r.t. percent-decoding. There is a function url_decode_hex() in url.c which supports this, but it's only used for Gopher (yay ;). I strongly suspect that it's caching them seperately, so indeed MediaWiki may need to be adapted for that.
Comment 5 Michael Holzt 2009-10-08 17:25:53 UTC
I'm not sure how redirecting could fix this, unless you want to redirect all URLs without percent-encoding to URLs with percent-encoding, which seems ugly? Technically both URLs are exactly the same, so in my opinion this shall be fixed in squid.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links