Last modified: 2012-02-28 00:31:42 UTC
(1) Go to http://www.wikipedia.org/ (2) Type 'vi' in the search filed (3) Press 'ENTER' or click the 'Search' button Current Result: - A file download is prompted. Expected Result: - A page with information on 'vi' should be displayed.
Please provide this file for analysis.
Created attachment 2265 [details] File downloaded for 'vi' search in wikipedia I have now attached the file that was prompted for downloading.
This works for me.
This file looks like gzipped HTML. It's possible we've got another problem with double-gzipping, or some inconsistency in the squids where incorrect headers get cached. Can you confirm if possible: * Version of IE and operating system? * Does your ISP use an HTTP proxy server? * Does this happen only when logged out, or only when logged in, or some combination?
(bah, first time i got an edit conflict on bugzilla; here's my two cent anyway) The file you provided is a gziped version of the correct HTML page for the article "vi". Gzip compression is used as a "transfer encoding" for all communication, to safe bandwidth - normally, the browser should just uncompress it and show it, without you noticing. For some reason, it apperently does not receive or understand the Content-Encoding header that is used to indicate this. This seems like a browser issue - what browser / operating system are you using? Does it happen when you search for something else on http://www.wikipedia.org/? Does it also happen when you search for vi directly in the Wikipedia? Does it happen if you visit the vi page directly? Side note: when trying with Firefox, i get the page fine, and it gets transferred as gzip. With wget, I also get the page correctly, it appears to be served uncompressed. Why?...
The gzip-or-plain selection is based on (I believe) the User-Agent and/or the Accept header. PHP has some magic for this in ob_gzhandler, which may or may not be documented: http://dk2.php.net/ob_gzhandler It's possible that this isn't 100% jibing with the Vary header we have, such that some IE's get served the wrong thing. Or, there might be some bad proxies which end up storing the wrong thing. Or, squid might be breaking. Mark's running some 2.6 squids experimentally, so there might be 'new' adventures.
I am using the following: Browser: Microsoft Internet Explorer Version: 6.0.2900.2180.xpsp_sp2_gdr.050301-1519 Cipher Strength: 128-bit OS: Microsoft XP And, my ISP does use an HTTP proxy server. And, when trying with Mozilla Firefox, no issues.
Please try with Internet Explorer, bypassing the proxy (use a different ISP if necessary). For reference: what ISP is it?
Please view Issue ID 7099 as well, seems to be related to this issue?
*** Bug 7100 has been marked as a duplicate of this bug. ***
*** Bug 7105 has been marked as a duplicate of this bug. ***
*** Bug 7107 has been marked as a duplicate of this bug. ***
To everyone who's come here because their bug has been marked as a duplicate of this, please provide: 1) The browser you're using (exact version, please). 2) Whether it works using another browser. 3) What ISP you're using, and (if you know) whether you're behind a proxy server. 4) Whether this happens only when you're logged in, only when you're logged out, or some combination. 5) Whether it happens every time you try loading the page, or only sometimes.
Need to check if anything's changed with respect to OutputPage, or if any changes to it have only recently been synchronised; finding out if anything's changed on the Squids configuration-wise would be another sensible move (we had a recent upgrade - related, or not?)...more and more of these issues from various people indicates something's buggered up big-time.
*** Bug 7111 has been marked as a duplicate of this bug. ***
Hi all, in my case, the issue is NONexistent today. i.e. No more prompts for file download. Did anybody fix it? Or did it get fixed due to some environment change in my end? [However, Issue 7099 is still existent on my IE.] And, by the way, the remaining answers to Simetrical's questions: (4) This used to happen when I was NOT logged in. Didn't test while logged in. [I created a wiki account just today :-)] (5) It used to happen every time I tried loading the page (I guess I tried only about 10 times though)
I havent looked into how the wikimedia gzip module works in detail, but Squid now supports content negotiation using ETag and If-None-Match to find which cached entity variant (identity vs gzip encoding, Swedish vs English etc) to send to the client. Which means that if the server is not sending correct ETag:s AND responds to If-None-Match, or when the Vary header is not correcly filled out then clients may be given an incorrect entity variant. Apache mod_gzip is an example having this problem where both the identity encoded and gzip encoded variants carry the same ETag and the server responds to If-None-Match on this ETag. There is a Squid directive to try to work around such broken servers (broken_vary_encoding or something like that). Quite likely it needs additional work as the matrix of broken servers out there is figured out.. Regards Henrik Squid-cache.org
Thinking. Most likely If-None-Match isn't needed to trigger this. Just sending incorrect ETag:s on Vary responses is most likely sufficient to get the cache confused.
Also see http://www.mail-archive.com/squid-dev@squid-cache.org/msg04514.html
Latest update from my side: (a) I learnt just now that my LAN/ISP does use Squid proxy server (b) A friend of mine, who belongs to the same LAN, is currently getting the Issue but I am NOT. (So what could have caused the issue if in case we rule out the possiblity of bypassing the proxy server, for this particular discrepancy.)
*** Bug 7118 has been marked as a duplicate of this bug. ***
Indeed, a quick test with and without Accept-Encoding seems to indicate that ETag is identical for both gzipped and cleartext responses.
I have disabled sending of the ETag header, as the standard is vague about whether MediaWiki or Squid is wrong w.r.t. W/ headers.
I was seeing this error yesterday on two different systems. It has cleared up today. I did run some packet captures (which I didn't save, sorry). When I was seeing the issue, I would receive the HTTP 200 OK packet, which included these two header tags: Content-Encoding: gzip\r\n Content-encoded entity body (gzip): 891 bytes -> 1775 bytes\r\n ... the gzip would follow, and my browser would ask if I wanted to download. Another machine didn't have the problem (in the same time frame). But it would get an "HTTP 304 Not Modified" header without the gzip tags. As of today, all three machines, with no changes made to them, are receiving the HTTP 304 headers and not wanting to download. The pages are displaying normally. So, disabling the ETag header seems to have done the trick, and thanks!
The standard is very clear if you ask me. Content-Encoding is an entity header. The gzipped body is an entity body. Each unique entity (entity headers + entity body) must carry an unique ETag per URL. (two entities on different URLs may have the same ETag, but no two different entities of the same URL can, now or in future).
It would be good if we could detect whether PHP's ob_gzhandler is actually in gzipping-mode so that we can send a different ETag; otherwise do we really need them at all? Could we just take this out in the mainline code?
Since it's now off by default I'm going to go ahead and mark this FIXED, though improved fixes are hypothetically possible.
*** Bug 7082 has been marked as a duplicate of this bug. ***
There is also a bug 16230 that might be related to this.
Reopening to dupe lots of stuff to it, as the issue has persisted even after the etag change.
*** Bug 16230 has been marked as a duplicate of this bug. ***
*** Bug 15457 has been marked as a duplicate of this bug. ***
*** Bug 15149 has been marked as a duplicate of this bug. ***
This bug may be of interest: https://bugzilla.wikimedia.org/show_bug.cgi?id=17537
There have been noticably many of these reports on OTRS lately, so I went through the tech queue from 2008 til now and the other English queues with some keywords. Here: OTRS#2009050710039921, got a file download dialog on article view OTRS#2009050710002156, on article view IE shows a File Download Security Warning for a file of Unknown File Type (with size matching the mentioned article gzipped) OTRS#2009050610056565, similar report with IE, "seen it so many times", went to find the information outside Wikipedia OTRS#2009050410016432, on article view gets a popup to handle a "ZIP file" (gzip?), suspects phishing OTRS#2009042810003331, got a file download dialog box on article view, suspects virus from Wikipedia OTRS#2009041610017974, got a file download dialog box on article view OTRS#2009041410062099, IE7 doesn't recognise the file type on article view, and upon saving the content turns out to be binary OTRS#2009041010028241, on article view IE8 shows a File Download Security Warning for a file of Unknown File Type (with size matching the mentioned article gzipped) OTRS#2009021410009418, ran Firefox in a configuration with altered Accept-Encoding, but got gzipped content anyway. Also reproduced with default wget. OTRS#2009020210021659, on article view IE shows a File Download Security Warning for a file of Unknown File Type (with size matching the mentioned article gzipped) OTRS#2009012910023699, IE showed a "File Download Security Warning" and didn't load the page OTRS#2008112110022973, on article view IE7 shows a File Download Security Warning for a file of Unknown File Type (with size matching the mentioned article gzipped) OTRS#2008111710013115, on article view IE shows a File Download Security Warning for a file of Unknown File Type (with size matching the mentioned article gzipped) OTRS#2008100110060088, got a file download dialog on article view, thinks it might be an executable and is worried OTRS#2008091510022622, sometimes on article view gets an archive file download dialog on Mac IE5 OTRS#2008091210024457, got a file download dialog with IE6 on Windows XP, saved file was binary OTRS#2008082610024727, wikisource article view pops up a file download dialog OTRS#2008081110037232, on article view IE shows a File Download Security Warning for a file of Unknown File Type (with size matching the mentioned article gzipped) OTRS#2008070910020541, got a file download box on article view with IE6 OTRS#2008070810003562, got a "zipped file" that on inspection turned out to be the article gzipped OTRS#2008052810008247, on article view IE6 shows a File Download Security Warning for a file of Unknown File Type (with size matching the mentioned article gzipped), suspect something malicious OTRS#2008052110013244, got a file download dialog for an unknown file type on article view, size matches mentioned article gzipped OTRS#2008050610017706, got a file download dialog on article view OTRS#2008042310020921, on article view IE shows a File Download Security Warning for a file of Unknown File Type (with size matching the mentioned article gzipped) OTRS#2008021910010382, got a file download dialog on article view Maybe related: OTRS#2008122510000693, "pretty sure it's a virus download page masked as antivirus software", could be a mirror site issue OTRS#2008120810021916, "article cannot be opened" OTRS#2008052610004898, "please stop sending popups on my computer" How can people help sort this out?
On the toolserver there's a tool that's accessed about 70 times per minute, and every time it fetches a page from the Amsterdam Squids without an Accept-Encoding header. When the gzipped page is stuck in the cache, it gets reported quickly as "gibberish content" and someone purges the cache of the page. Here's some mentions I found: 20090503 http://commons.wikimedia.org/w/index.php?title=User_talk%3AMagnus_Manske&diff=20993744&oldid=20861308 20090416 http://lists.wikimedia.org/pipermail/toolserver-l/2009-April/002034.html 20090106 http://toolserver.org/~bryan/TsLogBot/wikimedia-toolserver_2009-01-06.txt 20090102 http://jira.toolserver.org/browse/MAGNUS-103 20081006 http://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Geographical_coordinates/Archive_22#GeoHack_biffed.3F 20080606 http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_40#GeoHack
*** Bug 19356 has been marked as a duplicate of this bug. ***
Just adding my voice to the chorus. Here's the HTTP headers from a request and response where I can duplicate this problem: GET /wiki/Benzatropine HTTP/1.1 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6 Accept-Encoding: identity;q=1.0, gzip;q=0, *;q=0 Host: en.wikipedia.org HTTP/1.0 200 OK Date: Tue, 30 Jun 2009 14:04:55 GMT Server: Apache X-Powered-By: PHP/5.2.4-2ubuntu5wm1 Cache-Control: private, s-maxage=0, max-age=0, must-revalidate Content-Language: en Vary: Accept-Encoding,Cookie X-Vary-Options: Accept-Encoding;list-contains=gzip,Cookie;string-contains=enwikiToken;string-contains=enwikiLoggedOut;string-contains=enwiki_session;string-contains=centralauth_Token;string-contains=centralauth_Session;string-contains=centralauth_LoggedOut Last-Modified: Tue, 30 Jun 2009 13:54:32 GMT Content-Encoding: gzip Content-Length: 18123 Content-Type: text/html; charset=utf-8 Age: 86264 X-Cache: HIT from sq25.wikimedia.org X-Cache-Lookup: HIT from sq25.wikimedia.org:3128 X-Cache: MISS from sq36.wikimedia.org X-Cache-Lookup: MISS from sq36.wikimedia.org:80 Via: 1.1 sq25.wikimedia.org:3128 (squid/2.7.STABLE6), 1.0 sq36.wikimedia.org:80 (squid/2.7.STABLE6) Connection: close A quick note, while the client's UA says Firefox 2, it's actually Apache's HttpClient java library (org.apache.commons.httpclient.HttpClient).
David Albert, that doesn't tell us much because MediaWiki normally gzips content before sending it to the user, and the download results when "Content-Type: text/html; charset=utf-8 " isn't respected. Do you know what headers came before this request? (although, I don't see a Keep-Alive, so it can't be the garbage gzip data from previous response problem).
There were no headers before this request. Perhaps I'm not looking at the same problem, although on an initial read through, it looked very similar. The problem here is that the client is specifically requesting non gzipped data: Accept-Encoding: identity;q=1.0, gzip;q=0, *;q=0 and receiving it anyway.
This sounds like it might be an unrelated Squid cache bug. Why don't you file a separate bug?
what kind of client is that? of course, these headers are kind of standard, but usually all clients would just omit 'gzip' entirely, if they don't want gzip...
Edward, I created Bug 19463 for this issue. Domas, this is a custom client. Originally the Accept-Encoding string was simply identity. I added gzip;q=0 to see if being more specific helped solve the problem, but it did not.
Just got another one: http://en.wikipedia.org/wiki/Taximeter I wonder if it's significant that the link was just posted on Slashdot, so has probably just had a lot of traffic.
No idea if it's related, but I just got this error: =================================================== ERROR The requested URL could not be retrieved While trying to retrieve the URL: http://upload.wikimedia.org/wikipedia/en/2/2a/Tin_lamp_1930s.jpg_.jpg The following error was encountered: * Unable to forward this request at this time. This request could not be forwarded to the origin server or to any parent caches. The most likely cause for this error is that: * The cache administrator does not allow this cache to make direct connections to origin servers, and * All configured parent caches are currently unreachable. Your cache administrator is nobody. Generated Wed, 01 Jul 2009 17:51:12 GMT by sq13.wikimedia.org (squid/2.7.STABLE6) ======================================================= Started on http://en.wikipedia.org/wiki/Tinsmith clicked pic of lamp clicked pic again to get highres image got above error. Rinse, repeat, same error.
*** Bug 19463 has been marked as a duplicate of this bug. ***
This does not look like a normal Squid behaviour. Is there any Vary related patches applied to the wikimedia Squids? I have a faint memory of some patches related to optimizing Accept-Encoding to avoid having to go to the backend on each new Accept-Encoding variant... the "Unable to forward" error is completely unrelated to the Accept-Encoding issue.
I could find only discussion on: http://thread.gmane.org/gmane.comp.web.squid.devel/6273 http://thread.gmane.org/gmane.comp.web.squid.devel/9104 The 2.6.18 patch is at http://noc.wikimedia.org/~tstarling/patches/vary_options_upstream.patch Maybe Adrian knows something? (cc:)
Talk to Tim Starling about this stuff. Make sure the 2.7 Squid is patched with the Wikimedia patchsets or things won't work as well as you'd expect.
Another occurence: http://lists.wikimedia.org/pipermail/mediawiki-api/2010-November/002032.html User does not specify Accept-Encoding: gzip, but nevertheless gets a gzipped response.
Cannot see this now, thinking ops is monitoring this stuff.