Last modified: 2009-11-25 14:04:24 UTC
http://en.wikipedia.org/wiki/Doria's_Tree-kangaroo Wget and Perl get stale pages perl -w -MLWP::UserAgent -e "print LWP::UserAgent->new->get('http://en.wikipedia.org/wiki/Doria\'s_Tree-kangaroo')->decoded_content" Both wget and Perl get a version of the page from August, not the version I fixed in September.
Encoding problem? The URL should be http://en.wikipedia.org/wiki/Doria%27s_Tree-kangaroo
Thanks for the suggestion, but not an encding problem: the UTF8 URI is fine, try it in your browser: http://en.wikipedia.org/wiki/Doria's_Tree-kangaroo You can see my edit on the history. Note the escaped backslash within the URI in the Perl example is simply because Perl has the URI quote in single quotes. Using a different quote operator would have been clearer: perl -w -MLWP::UserAgent -e "print LWP::UserAgent->new->get(qq(http://en.wikipedia.org/wiki/Doria's_Tree-kangaroo))->decoded_content"
Can't reproduce. Did someone purge it?
It's not important. MediaWiki will only purge canonical article URLs, in this case the one with the %27 not the one with the '. It can't purge every possible variant URL, there are thousands. That is not a bug. So if you request non-canonical URLs you can expect to get stale pages. You could argue that it's a bug that it responds to the bad URL with a 200, when it should redirect. But that's a topic for another bug report, I'm marking this one invalid.
(In reply to comment #4) > It's not important. MediaWiki will only purge canonical article URLs, in this > case the one with the %27 not the one with the '. It can't purge every possible > variant URL, there are thousands. That is not a bug. So if you request > non-canonical URLs you can expect to get stale pages. Can you demonstrate to me why the non-URI-encoded URI is not canonical? The URI seems to return 200/OK, with full content, which does not mention a redirect, nor a canonical URI. Are the two URIs not identical in effect, just differently encoded?
(In reply to comment #5) > (In reply to comment #4) > > It's not important. MediaWiki will only purge canonical article URLs, in this > > case the one with the %27 not the one with the '. It can't purge every possible > > variant URL, there are thousands. That is not a bug. So if you request > > non-canonical URLs you can expect to get stale pages. > > Can you demonstrate to me why the non-URI-encoded URI is not canonical? > Because the URL-encoded one is the URL MediaWiki gives you when you search for or link to this article, I guess. > The URI seems to return 200/OK, with full content, which does not mention a > redirect, nor a canonical URI. > Like Tim said, that should be fixed, but it's a separate bug. > Are the two URIs not identical in effect, just differently encoded? > Squid doesn't seem to think so, as it seems to cache the two separately.
It might help if you link to the "separate bug" you mention, or if you are the first to identify it, create the ticket and link to it. As for Squid's opinion, is that relevant to Media Wiki? Your use of "canonical" seems at odds with that in the mark-up of the Wiki pages, where both the encoded and non-encoded URIs are canonical, as opposed to a different page which redirects to the one in question. I'm surprised I had to say this.
(In reply to comment #7) > It might help if you link to the "separate bug" you mention, or if you are the > first to identify it, create the ticket and link to it. > That would be bug 21027. > As for Squid's opinion, is that relevant to Media Wiki? > It is relevant to Wikimedia wikis, as they run a Squid caching layer on top of MediaWiki. > Your use of "canonical" seems at odds with that in the mark-up of the Wiki > pages, where both the encoded and non-encoded URIs are canonical, as opposed to > a different page which redirects to the one in question. > > I'm surprised I had to say this. > Do wiki pages contain different URLs for the same page? I would expect all URLs generated by MediaWiki to be canonicalized somehow, i.e. that all links to the same page use the same URL.
Thank you for providing the link to the ticket. Do you not think the use of Squid is separate from MediaWiki? Should not both implement the same standards for URIs/URLs? Ideally, the URI should be treated the same whether encoded or not, by all software. The term "canonical", in terms of URLs/URIs in general is reasonably-well defined on Wikipedia: http://en.wikipedia.org/wiki/URL_normalization In terms of MediaWiki end-users, I think the term is used to refer to the ultimate resource for a term. If page A instantly redirects to page B, because an editorial decision has been made that term A is a synonyms for term B, then page A will contain a Javascript variable with the canonical term. In both cases, URI-encoding is dropped prior to the creation of the canonical ID. I am sorry I do not have an example to hand. Cheers Lee
(In reply to comment #9) > Thank you for providing the link to the ticket. > > Do you not think the use of Squid is separate from MediaWiki? Should not both > implement the same standards for URIs/URLs? Ideally, the URI should be treated > the same whether encoded or not, by all software. > Of course. A suggestion made on the linked bug is that Squid should be fixed to purge alternates as well. > The term "canonical", in terms of URLs/URIs in general is reasonably-well > defined on Wikipedia: http://en.wikipedia.org/wiki/URL_normalization > > In terms of MediaWiki end-users, I think the term is used to refer to the > ultimate resource for a term. If page A instantly redirects to page B, because > an editorial decision has been made that term A is a synonyms for term B, then > page A will contain a Javascript variable with the canonical term. > > In both cases, URI-encoding is dropped prior to the creation of the canonical > ID. > > I am sorry I do not have an example to hand. > I understand what you mean by 'canonical', and I agree that each page should have exactly one canonical URL. It's my impression that MediaWiki enforces this correctly apart from not redirecting on under- or over-encoded URLs, as outlined in this bug and the one I linked to.