Last modified: 2012-02-28 00:31:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T9098, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 7098 - Opening a Wikipedia article results in attempt to download gzipped version


Summary:	Opening a Wikipedia article results in attempt to download gzipped version

Status:	RESOLVED FIXED

Product:	Wikimedia
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	unspecified
Hardware:	PC Windows XP

Importance:	Low major with 1 vote (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:	http://www.wikipedia.org/
Whiteboard:
Keywords:

Duplicates:	7100 7105 7107 7111 7118 15149 15457 16230 19356 19463 (view as bug list)
Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2006-08-23 06:53 UTC by Pramod Kumar Rai
Modified:	2012-02-28 00:31 UTC (History)
CC List:	24 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
File downloaded for 'vi' search in wikipedia (9.09 KB, application/x-gzip-compressed) 2006-08-23 09:55 UTC, Pramod Kumar Rai	Details
Add an attachment (proposed patch, testcase, etc.)

Description Pramod Kumar Rai 2006-08-23 06:53:01 UTC

(1) Go to http://www.wikipedia.org/
(2) Type 'vi' in the search filed
(3) Press 'ENTER' or click the 'Search' button

Current Result:
- A file download is prompted.

Expected Result:
- A page with information on 'vi' should be displayed.

Comment 1 Brion Vibber 2006-08-23 09:34:22 UTC

Please provide this file for analysis.

Comment 2 Pramod Kumar Rai 2006-08-23 09:55:06 UTC

Created attachment 2265 [details]
File downloaded for 'vi' search in wikipedia

I have now attached the file that was prompted for downloading.

Comment 3 Andrew Garrett 2006-08-23 09:58:41 UTC

This works for me.

Comment 4 Brion Vibber 2006-08-23 10:10:43 UTC

This file looks like gzipped HTML.

It's possible we've got another problem with double-gzipping, or some inconsistency in the 
squids where incorrect headers get cached.

Can you confirm if possible:
* Version of IE and operating system?
* Does your ISP use an HTTP proxy server?
* Does this happen only when logged out, or only when logged in, or some combination?

Comment 5 Daniel Kinzler 2006-08-23 10:15:53 UTC

(bah, first time i got an edit conflict on bugzilla; here's my two cent anyway)

The file you provided is a gziped version of the correct HTML page for the
article "vi". Gzip compression is used as a "transfer encoding" for all
communication, to safe bandwidth - normally, the browser should just uncompress
it and show it, without you noticing. For some reason, it apperently does not
receive or understand the Content-Encoding header that is used to indicate this.

This seems like a browser issue - what browser / operating system are you using?
Does it happen when you search for something else on http://www.wikipedia.org/?
Does it also happen when you search for vi directly in the Wikipedia? Does it
happen if you visit the vi page directly?

Side note: when trying with Firefox, i get the page fine, and it gets
transferred as gzip. With wget, I also get the page correctly, it appears to be
served uncompressed. Why?...

Comment 6 Brion Vibber 2006-08-23 10:24:14 UTC

The gzip-or-plain selection is based on (I believe) the User-Agent and/or the Accept header. 
PHP has some magic for this in ob_gzhandler, which may or may not be documented:

http://dk2.php.net/ob_gzhandler

It's possible that this isn't 100% jibing with the Vary header we have, such that some IE's get 
served the wrong thing. Or, there might be some bad proxies which end up storing the wrong 
thing. Or, squid might be breaking. Mark's running some 2.6 squids experimentally, so there 
might be 'new' adventures.

Comment 7 Pramod Kumar Rai 2006-08-23 10:28:08 UTC

I am using the following:

Browser:
Microsoft Internet Explorer
Version: 6.0.2900.2180.xpsp_sp2_gdr.050301-1519
Cipher Strength: 128-bit

OS:
Microsoft XP

And, my ISP does use an HTTP proxy server.

And, when trying with Mozilla Firefox, no issues.

Comment 8 Daniel Kinzler 2006-08-23 10:31:52 UTC

Please try with Internet Explorer, bypassing the proxy (use a different ISP if
necessary). 

For reference: what ISP is it?

Comment 9 Pramod Kumar Rai 2006-08-23 10:41:04 UTC

Please view Issue ID 7099 as well, seems to be related to this issue?

Comment 10 Brion Vibber 2006-08-23 11:38:41 UTC

*** Bug 7100 has been marked as a duplicate of this bug. ***

Comment 11 Brion Vibber 2006-08-23 15:41:04 UTC

*** Bug 7105 has been marked as a duplicate of this bug. ***

Comment 12 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-08-23 18:53:31 UTC

*** Bug 7107 has been marked as a duplicate of this bug. ***

Comment 13 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-08-23 18:57:23 UTC

To everyone who's come here because their bug has been marked as a duplicate of
this, please provide:

1) The browser you're using (exact version, please).
2) Whether it works using another browser.
3) What ISP you're using, and (if you know) whether you're behind a proxy server.
4) Whether this happens only when you're logged in, only when you're logged out,
or some combination.
5) Whether it happens every time you try loading the page, or only sometimes.

Comment 14 Rob Church 2006-08-23 23:49:26 UTC

Need to check if anything's changed with respect to OutputPage, or if any
changes to it have only recently been synchronised; finding out if anything's
changed on the Squids configuration-wise would be another sensible move (we had
a recent upgrade - related, or not?)...more and more of these issues from
various people indicates something's buggered up big-time.

Comment 15 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-08-24 04:21:47 UTC

*** Bug 7111 has been marked as a duplicate of this bug. ***

Comment 16 Pramod Kumar Rai 2006-08-24 04:41:11 UTC

Hi all, in my case, the issue is NONexistent today. i.e. No more prompts for file 
download. Did anybody fix it? Or did it get fixed due to some environment change in 
my end? [However, Issue 7099 is still existent on my IE.] 

And, by the way, the remaining answers to Simetrical's questions:

(4) This used to happen when I was NOT logged in. Didn't test while logged in. [I 
created a wiki account just today :-)]
(5) It used to happen every time I tried loading the page (I guess I tried only about 
10 times though)

Comment 17 Henrik Nordström 2006-08-24 08:45:18 UTC

I havent looked into how the wikimedia gzip module works in detail, but Squid
now supports content negotiation using ETag and If-None-Match to find which
cached entity variant (identity vs gzip encoding, Swedish vs English etc) to
send to the client.

Which means that if the server is not sending correct ETag:s AND responds to
If-None-Match, or when the Vary header is not correcly filled out then clients
may be given an incorrect entity variant.

Apache mod_gzip is an example having this problem where both the identity
encoded and gzip encoded variants carry the same ETag and the server responds to
If-None-Match on this ETag. There is a Squid directive to try to work around
such broken servers (broken_vary_encoding or something like that). Quite likely
it needs additional work as the matrix of broken servers out there is figured out..


Regards
Henrik
Squid-cache.org

Comment 18 Henrik Nordström 2006-08-24 08:48:17 UTC

Thinking. Most likely If-None-Match isn't needed to trigger this. Just sending
incorrect ETag:s on Vary responses is most likely sufficient to get the cache
confused.

Comment 19 Mark Bergsma 2006-08-24 09:59:33 UTC

Also see http://www.mail-archive.com/squid-dev@squid-cache.org/msg04514.html

Comment 20 Pramod Kumar Rai 2006-08-24 11:35:09 UTC

Latest update from my side:

(a) I learnt just now that my LAN/ISP does use Squid proxy server
(b) A friend of mine, who belongs to the same LAN, is currently getting the Issue but 
I am NOT. (So what could have caused the issue if in case we rule out the possiblity 
of bypassing the proxy server, for this particular discrepancy.)

Comment 21 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-08-24 14:42:30 UTC

*** Bug 7118 has been marked as a duplicate of this bug. ***

Comment 22 Mark Bergsma 2006-08-24 14:57:34 UTC

Indeed, a quick test with and without Accept-Encoding seems to indicate that
ETag is identical for both gzipped and cleartext responses.

Comment 23 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-08-24 15:00:27 UTC

*** Bug 7118 has been marked as a duplicate of this bug. ***

Comment 24 Mark Bergsma 2006-08-24 15:36:19 UTC

I have disabled sending of the ETag header, as the standard is vague about
whether MediaWiki or Squid is wrong w.r.t. W/ headers.

Comment 25 Bryan Lockwood 2006-08-24 17:06:45 UTC

I was seeing this error yesterday on two different systems. It has cleared up today.

I did run some packet captures (which I didn't save, sorry). When I was seeing the issue, I would 
receive the HTTP 200 OK packet, which included these two header tags:

Content-Encoding: gzip\r\n
Content-encoded entity body (gzip): 891 bytes -> 1775 bytes\r\n

... the gzip would follow, and my browser would ask if I wanted to download. Another machine 
didn't have the problem (in the same time frame). But it would get an "HTTP 304 Not Modified" 
header without the gzip tags.

As of today, all three machines, with no changes made to them, are receiving the HTTP 304 headers 
and not wanting to download. The pages are displaying normally. So, disabling the ETag header 
seems to have done the trick, and thanks!

Comment 26 Henrik Nordström 2006-08-24 19:45:38 UTC

The standard is very clear if you ask me.

Content-Encoding is an entity header.

The gzipped body is an entity body.

Each unique entity (entity headers + entity body) must carry an unique ETag per
URL. (two entities on different URLs may have the same ETag, but no two
different entities of the same URL can, now or in future).

Comment 27 Brion Vibber 2006-08-28 11:23:37 UTC

It would be good if we could detect whether PHP's ob_gzhandler is actually
in gzipping-mode so that we can send a different ETag; otherwise do we
really need them at all? Could we just take this out in the mainline code?

Comment 28 Brion Vibber 2006-08-28 14:26:33 UTC

Since it's now off by default I'm going to go ahead and mark this FIXED,
though improved fixes are hypothetically possible.

Comment 29 Rob Church 2006-08-30 07:33:02 UTC

*** Bug 7082 has been marked as a duplicate of this bug. ***

Comment 30 Marcin Cieślak 2009-01-06 02:14:01 UTC

There is also a bug 16230 that might be related to this.

Comment 31 Brion Vibber 2009-03-09 23:24:29 UTC

Reopening to dupe lots of stuff to it, as the issue has persisted even after the etag change.

Comment 32 Brion Vibber 2009-03-09 23:24:42 UTC

*** Bug 16230 has been marked as a duplicate of this bug. ***

Comment 33 Brion Vibber 2009-03-09 23:24:53 UTC

*** Bug 15457 has been marked as a duplicate of this bug. ***

Comment 34 Brion Vibber 2009-03-09 23:25:17 UTC

*** Bug 15149 has been marked as a duplicate of this bug. ***

Comment 35 Edward Z. Yang 2009-03-15 05:34:08 UTC

This bug may be of interest:

https://bugzilla.wikimedia.org/show_bug.cgi?id=17537

Comment 36 Para 2009-05-07 11:12:12 UTC

There have been noticably many of these reports on OTRS lately, so I went through the tech queue from 2008 til now and the other English queues with some keywords. Here:

OTRS#2009050710039921, got a file download dialog on article view
OTRS#2009050710002156, on article view IE shows a File Download Security Warning for a file of Unknown File Type (with size matching the mentioned article gzipped)
OTRS#2009050610056565, similar report with IE, "seen it so many times", went to find the information outside Wikipedia
OTRS#2009050410016432, on article view gets a popup to handle a "ZIP file" (gzip?), suspects phishing
OTRS#2009042810003331, got a file download dialog box on article view, suspects virus from Wikipedia
OTRS#2009041610017974, got a file download dialog box on article view
OTRS#2009041410062099, IE7 doesn't recognise the file type on article view, and upon saving the content turns out to be binary
OTRS#2009041010028241, on article view IE8 shows a File Download Security Warning for a file of Unknown File Type (with size matching the mentioned article gzipped)
OTRS#2009021410009418, ran Firefox in a configuration with altered Accept-Encoding, but got gzipped content anyway. Also reproduced with default wget.
OTRS#2009020210021659, on article view IE shows a File Download Security Warning for a file of Unknown File Type (with size matching the mentioned article gzipped)
OTRS#2009012910023699, IE showed a "File Download Security Warning" and didn't load the page
OTRS#2008112110022973, on article view IE7 shows a File Download Security Warning for a file of Unknown File Type (with size matching the mentioned article gzipped)
OTRS#2008111710013115, on article view IE shows a File Download Security Warning for a file of Unknown File Type (with size matching the mentioned article gzipped)
OTRS#2008100110060088, got a file download dialog on article view, thinks it might be an executable and is worried
OTRS#2008091510022622, sometimes on article view gets an archive file download dialog on Mac IE5
OTRS#2008091210024457, got a file download dialog with IE6 on Windows XP, saved file was binary
OTRS#2008082610024727, wikisource article view pops up a file download dialog
OTRS#2008081110037232, on article view IE shows a File Download Security Warning for a file of Unknown File Type (with size matching the mentioned article gzipped)
OTRS#2008070910020541, got a file download box on article view with IE6
OTRS#2008070810003562, got a "zipped file" that on inspection turned out to be the article gzipped
OTRS#2008052810008247, on article view IE6 shows a File Download Security Warning for a file of Unknown File Type (with size matching the mentioned article gzipped), suspect something malicious
OTRS#2008052110013244, got a file download dialog for an unknown file type on article view, size matches mentioned article gzipped
OTRS#2008050610017706, got a file download dialog on article view
OTRS#2008042310020921, on article view IE shows a File Download Security Warning for a file of Unknown File Type (with size matching the mentioned article gzipped)
OTRS#2008021910010382, got a file download dialog on article view

Maybe related:
OTRS#2008122510000693, "pretty sure it's a virus download page masked as antivirus software", could be a mirror site issue
OTRS#2008120810021916, "article cannot be opened"
OTRS#2008052610004898, "please stop sending popups on my computer"

How can people help sort this out?

Comment 37 Para 2009-05-07 14:03:47 UTC

On the toolserver there's a tool that's accessed about 70 times per minute, and every time it fetches a page from the Amsterdam Squids without an Accept-Encoding header. When the gzipped page is stuck in the cache, it gets reported quickly as "gibberish content" and someone purges the cache of the page. Here's some mentions I found:

20090503 http://commons.wikimedia.org/w/index.php?title=User_talk%3AMagnus_Manske&diff=20993744&oldid=20861308
20090416 http://lists.wikimedia.org/pipermail/toolserver-l/2009-April/002034.html
20090106 http://toolserver.org/~bryan/TsLogBot/wikimedia-toolserver_2009-01-06.txt
20090102 http://jira.toolserver.org/browse/MAGNUS-103
20081006 http://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Geographical_coordinates/Archive_22#GeoHack_biffed.3F
20080606 http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_40#GeoHack

Comment 38 Brion Vibber 2009-06-23 00:15:07 UTC

*** Bug 19356 has been marked as a duplicate of this bug. ***

Comment 39 David Albert 2009-07-01 14:06:24 UTC

Just adding my voice to the chorus. Here's the HTTP headers from a request and response where I can duplicate this problem:

GET /wiki/Benzatropine HTTP/1.1
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6
Accept-Encoding: identity;q=1.0, gzip;q=0, *;q=0
Host: en.wikipedia.org

HTTP/1.0 200 OK
Date: Tue, 30 Jun 2009 14:04:55 GMT
Server: Apache
X-Powered-By: PHP/5.2.4-2ubuntu5wm1
Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
Content-Language: en
Vary: Accept-Encoding,Cookie
X-Vary-Options: Accept-Encoding;list-contains=gzip,Cookie;string-contains=enwikiToken;string-contains=enwikiLoggedOut;string-contains=enwiki_session;string-contains=centralauth_Token;string-contains=centralauth_Session;string-contains=centralauth_LoggedOut
Last-Modified: Tue, 30 Jun 2009 13:54:32 GMT
Content-Encoding: gzip
Content-Length: 18123
Content-Type: text/html; charset=utf-8
Age: 86264
X-Cache: HIT from sq25.wikimedia.org
X-Cache-Lookup: HIT from sq25.wikimedia.org:3128
X-Cache: MISS from sq36.wikimedia.org
X-Cache-Lookup: MISS from sq36.wikimedia.org:80
Via: 1.1 sq25.wikimedia.org:3128 (squid/2.7.STABLE6), 1.0 sq36.wikimedia.org:80 (squid/2.7.STABLE6)
Connection: close

A quick note, while the client's UA says Firefox 2, it's actually Apache's HttpClient java library (org.apache.commons.httpclient.HttpClient).

Comment 40 Edward Z. Yang 2009-07-01 14:24:04 UTC

David Albert, that doesn't tell us much because MediaWiki normally gzips content before sending it to the user, and the download results when "Content-Type: text/html; charset=utf-8
" isn't respected. Do you know what headers came before this request? (although, I don't see a Keep-Alive, so it can't be the garbage gzip data from previous response problem).

Comment 41 David Albert 2009-07-01 14:30:58 UTC

There were no headers before this request. Perhaps I'm not looking at the same problem, although on an initial read through, it looked very similar. The problem here is that the client is specifically requesting non gzipped data:

Accept-Encoding: identity;q=1.0, gzip;q=0, *;q=0

and receiving it anyway.

Comment 42 Edward Z. Yang 2009-07-01 14:32:50 UTC

This sounds like it might be an unrelated Squid cache bug. Why don't you file a separate bug?

Comment 43 Domas Mituzas 2009-07-01 14:34:19 UTC

what kind of client is that? 

of course, these headers are kind of standard, but usually all clients would just omit 'gzip' entirely, if they don't want gzip...

Comment 44 David Albert 2009-07-01 14:47:37 UTC

Edward, I created Bug 19463 for this issue.

Domas, this is a custom client. Originally the Accept-Encoding string was simply identity. I added gzip;q=0 to see if being more specific helped solve the problem, but it did not.

Comment 45 Rez 2009-07-01 15:52:23 UTC

Just got another one:

http://en.wikipedia.org/wiki/Taximeter

I wonder if it's significant that the link was just posted on Slashdot, so has probably just had a lot of traffic.

Comment 46 Rez 2009-07-01 18:04:48 UTC

No idea if it's related, but I just got this error:
===================================================

ERROR
The requested URL could not be retrieved

While trying to retrieve the URL: http://upload.wikimedia.org/wikipedia/en/2/2a/Tin_lamp_1930s.jpg_.jpg

The following error was encountered:

    * Unable to forward this request at this time. 

This request could not be forwarded to the origin server or to any parent caches. The most likely cause for this error is that:

    * The cache administrator does not allow this cache to make direct connections to origin servers, and
    * All configured parent caches are currently unreachable. 

Your cache administrator is nobody.
Generated Wed, 01 Jul 2009 17:51:12 GMT by sq13.wikimedia.org (squid/2.7.STABLE6) 
=======================================================

Started on http://en.wikipedia.org/wiki/Tinsmith
clicked pic of lamp
clicked pic again to get highres image
got above error.

Rinse, repeat, same error.

Comment 47 Brion Vibber 2009-07-02 00:05:32 UTC

*** Bug 19463 has been marked as a duplicate of this bug. ***

Comment 48 Henrik Nordström 2009-07-04 00:24:29 UTC

This does not look like a normal Squid behaviour.

Is there any Vary related patches applied to the wikimedia Squids? 

I have a faint memory of some patches related to optimizing Accept-Encoding to avoid having to go to the backend on each new Accept-Encoding variant...


the "Unable to forward" error is completely unrelated to the Accept-Encoding issue.

Comment 49 Marcin Cieślak 2009-07-04 20:18:33 UTC

I could find only discussion on:

http://thread.gmane.org/gmane.comp.web.squid.devel/6273

http://thread.gmane.org/gmane.comp.web.squid.devel/9104

The 2.6.18 patch is at

http://noc.wikimedia.org/~tstarling/patches/vary_options_upstream.patch

Maybe Adrian knows something? (cc:)

Comment 50 Adrian Chadd 2009-07-05 05:46:03 UTC

Talk to Tim Starling about this stuff. Make sure the 2.7 Squid is patched with the Wikimedia patchsets or things won't work as well as you'd expect.

Comment 51 Bryan Tong Minh 2010-11-26 07:51:43 UTC

Another occurence: http://lists.wikimedia.org/pipermail/mediawiki-api/2010-November/002032.html

User does not specify Accept-Encoding: gzip, but nevertheless gets a gzipped response.

Comment 52 Mark A. Hershberger 2011-03-13 17:02:00 UTC

Cannot see this now, thinking ops is monitoring this stuff.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links

adrian
andy.palmer
bobgels
brion
Bryan.TongMinh
davidbalbert
domas.mituzas
dwallbank
ezyang
hno
jdcurry
jmldalton
mah
marcin.cieslak
mark
mburns_08109
mediawiki-bugs
mediazilla
Platonides
research
rividh
rottek999
tk1b
wikipara