Last modified: 2009-03-09 23:25:17 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T17149, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 15149 - some pages are delivering raw GZIP encoding


Summary:	some pages are delivering raw GZIP encoding

Status:	RESOLVED DUPLICATE of bug 7098

Product:	Wikimedia
Classification:	Unclassified
Component:	General/Unknown (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal major (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:	http://en.wikipedia.org/wiki/The_Inhe...
Whiteboard:
Keywords:

Duplicates:	15830 15993 (view as bug list)
Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2008-08-13 08:38 UTC by Rez
Modified:	2009-03-09 23:25 UTC (History)
CC List:	9 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
As sent to Netscape 3 (8.75 KB, application/octet-stream) 2008-08-16 14:40 UTC, Rez	Details
How it decodes (done by WinRAR) (28.45 KB, text/html) 2008-08-16 14:44 UTC, Rez	Details
headers in sam spade (60.21 KB, image/gif) 2008-08-24 16:30 UTC, Thoken	Details
http://en.wikipedia.org/wiki/Jena_Six (41.23 KB, application/octet-stream) 2008-08-24 16:31 UTC, Thoken	Details
https://bugzilla.wikimedia.org/attachment.cgi?id=5180 with \r\n line endings converted to \n (8.72 KB, application/octet-stream) 2008-10-04 18:58 UTC, Brad Jorsch	Details
TCPDUMP of IE7 behind Squid, anon user, cleared browsercache before (473.52 KB, application/x-zip-compressed) 2008-10-16 14:21 UTC, Johann H. Addicks	Details
Add an attachment (proposed patch, testcase, etc.)

Description Rez 2008-08-13 08:38:40 UTC

http://en.wikipedia.org/wiki/The_Inheritors_(William_Golding) is one of the affected pages, but I've seen 3 or 4 others today that do the same thing: in an older browser, which normally renders Wikipedia as very legible plain text, I am getting naked GZIP (compressed to binary). The affected pages work fine in a newer browser. It is reproduceable for the affected pages, however these may be random pages (tomorrow it may affect some other pages!)

When I save the file (exactly as sent to me by the server) locally, and look at it with a hex viewer, I confirmed that it is a compressed binary, not text. QuickViewPlus IDs the file as UNIX GZip. However QVP decompresses not to text (HTML) but rather, to this interesting goulash (sample from first line):

/*::lh1 clasl1-first="Mjnt="pagn lhrefead> faviaml:: WerSub">Frrplal:RecconChontes i_ lfeed=ds"mft/fh3yla hxml: clasl1yla hxml: jump&am-

Might this indicate that the GZip is corrupt, thus not being decoded by the unforgiving older browser? (If so, might this indicate a corrupt cache file or a hard disk going bad?)

I saved and looked at some working pages and confirmed that NORMALLY, my old browser sees perfectly normal HTML.

I've encountered this "what do you mean, some pages arrive in GZip?" issue on another site where I was able to research the problem, and that proved due to a server bug, tho I don't recall the details as it was some years ago.

Affected browser: Netscape 3 (still wonderful for READING TEXT!)
Not affected (same system): Seamonkey 1.1.9

This is not a matter of NS3 not knowing what to do with GZip; most servers now use compression, and finding myself with naked GZip is VERY rare (this is maybe the 3rd or 4th time I've seen it in 12 years online).

Comment 1 Tim Starling 2008-08-16 12:00:42 UTC

Can you please attach the file you saved? (Click "add an attachment")

Comment 2 Rez 2008-08-16 14:40:09 UTC

Created attachment 5180 [details]
As sent to Netscape 3

Comment 3 Rez 2008-08-16 14:44:06 UTC

Created attachment 5181 [details]
How it decodes (done by WinRAR)

This is the same file (the one NS3 spit up as raw GZip) as decoded by WinRAR (which also whines that "the file is corrupt"). You can see that the decompressed HTML is a poor match for the page's actual content! Only the first disk-sector worth or so is not mangled.

Comment 4 Thoken 2008-08-24 16:27:57 UTC

probably the same bug:
server returning "Content-encoding: gzip" and gzipped content on some pages with some clients

clients
 - IE 6.0
 - Oberon V4-2.3 Web 1.0 (Andreas Krumenacker) 1997
 - Sam Spade (Beta) 1.14 (Steve Atkins) 1997-1999
no issue with Firefox 3.0
no proxy - ISP t-dialin.net (german telekom)
happens also when logged in (checked only with Oberon)
happens every time loading the page
(answering https://bugzilla.wikimedia.org/show_bug.cgi?id=7098#c13 )

examples:
http://de.wikipedia.org/wiki/Bhopal
http://en.wikipedia.org/wiki/Bhopal_(disambiguation)
http://en.wikipedia.org/wiki/Jena_Six
not with:
http://en.wikipedia.org/wiki/Jena_six

Comment 5 Thoken 2008-08-24 16:30:16 UTC

Created attachment 5213 [details]
headers in sam spade

Comment 6 Thoken 2008-08-24 16:31:45 UTC

Created attachment 5214 [details]
http://en.wikipedia.org/wiki/Jena_Six

Comment 7 Thoken 2008-08-27 06:43:26 UTC

still the same:
http://de.wikipedia.org/wiki/Bhopal
http://de.wikipedia.org/wiki/Biathlon-Mixed-Relay-Weltmeisterschaft_2005

ok, but not with IE6.0 W2000SP4, still shows file download dialog:
http://de.wikipedia.org/wiki/Biathlon-Weltmeisterschaften_1973
http://de.wikipedia.org/wiki/Biathlon-Weltmeisterschaften_1971

seems ok now:
http://en.wikipedia.org/wiki/Bhopal_(disambiguation)
http://en.wikipedia.org/wiki/Jena_Six
http://en.wikipedia.org/wiki/The_Inheritors_(William_Golding)

Comment 8 Max Semenik 2008-09-13 19:09:00 UTC

This problem is still there, from OTRS: [[Yiff]], IE6 on XP.

Comment 9 Rez 2008-09-13 19:36:23 UTC

All the above links work okay today, but yesterday I ran into another page that repeatedly came across as GZip for NS3. (Forgot to record which page, but I think it's a random server error, hence the individual pages affected vary from day to day)

Is this relevant? http://www.linuxplanet.com/linuxplanet/tutorials/5461/1/
http://www.schroepl.net/projekte/mod_gzip/browser.htm.htm

From the latter page:
=====================
Netscape Navigator 3

This browser uses HTTP/1.0. It doesn't send an Accept-Encoding header, thus doesn't request compressed content
from a server.

The browser does not yet support the processing of compressed page content. If it receives gzip compressed content, it recognizes that there is an encoding gzip unknown to it (and displays a corresponding message to the user), but after that it displays the compressed page content within the browser windows. Serving compressed content unconditionally (like in statically precompressed documents) this browser isn't good for.

A web server correctly evaluating the Accept-Encoding header is able to serve usable, uncompressed data to the browser.
====================

So it sounds like sometimes the server is not hearing the "Accept encoding" header from the browser, and is defaulting to GZip.

(Hey guys, thanks for taking this seriously, and helping keep Wikipedia accessable to everyone everywhere)

Comment 10 Aaron Schulz 2008-09-13 20:29:25 UTC

The MW core code looks fine, perhaps it is a header handling issue with the squids?

Comment 11 Tim Starling 2008-09-14 00:54:56 UTC

Most likely a Vary-related Squid bug.

Comment 12 Platonides 2008-09-14 20:53:44 UTC

I just got a wikipedia page with Firefox 3 claiming that it used a compression it couldn't understand. Just reloading went fine. Could be a network glitch or a symptom of something more serious.

Comment 13 Max Semenik 2008-09-15 11:47:41 UTC

More complaints via OTRS, now on IE5/Mac (but works on Netscape 7).

Comment 14 Platonides 2008-10-04 18:28:06 UTC

*** Bug 15830 has been marked as a duplicate of this bug. ***

Comment 15 Brad Jorsch 2008-10-04 18:58:44 UTC

Created attachment 5386 [details]
https://bugzilla.wikimedia.org/attachment.cgi?id=5180 with \r\n line endings converted to \n

It seems that the corruption in attachment #5180 [details] is due to something trying to convert unix-style "\n" line endings to windows-style "\r\n" line endings. If this is undone, it decompresses without errors.

I can't say whether this was done by MediaWiki, Squid, a proxy ("transparent" or otherwise) on Rez's end of the connection, or just mis-saved out of NS3.

Comment 16 Rez 2008-10-04 19:14:22 UTC

In my experience (across thousands of compressed files including GZips) Netscape does NOT corrupt saved files; it just saves whatever the server sends it.

QuickViewPlus' decompressor might have mangled it, tho I've never seen that happen before -- but remember, WinRAR thought the original file was corrupt, and it's usually right.

No proxies here that I'm aware of. I'm on a fixed-wireless connection via a Motorola radio-modem (it does have a built-in router), and my local provider goes direct to AT&T's backbone.

Comment 17 Brad Jorsch 2008-10-04 19:21:32 UTC

There could be a difference between Netscape downloading a compressed file and Netscape saving compressed file data it is displaying as if it were text.

Comment 18 Rez 2008-10-04 19:29:59 UTC

Possible; I don't recall which way I saved the sample. The browser view has a line length limit for what it will display; don't know if that affects saving it. However, I do know NS does NOT corrupt stuff if you do "Save Target As..." without displaying it first (I use it to save misc. binaries all the time).

Next time I see one of these pages I'll try it both ways and find out :)

BTW I got another page in naked GZip a couple days ago, something entirely random (was in a hurry and forgot to note the URL :(

Comment 19 Tim Starling 2008-10-16 04:15:18 UTC

*** Bug 15993 has been marked as a duplicate of this bug. ***

Comment 20 Johann H. Addicks 2008-10-16 14:21:43 UTC

Created attachment 5442 [details]
TCPDUMP of IE7 behind Squid, anon user, cleared browsercache before

downloaded "files" included to .pcap (wireshark dumpfile) 
Happens very randomly, today mostly on project pages (but main article namespace as well affected, but not in dump)

Comment 21 Platonides 2008-10-16 14:49:18 UTC

This time the Wikipedia:F and WPVS appear there. Use filter (ip.addr eq 10.254.130.191 and ip.addr eq 10.254.130.190) and (tcp.port eq 2796 and tcp.port eq 126) and scroll to the bottom.
Still, I don't see anything wrong in the communication. Browser states Accept-Encoding: gzip and response is gzippped and contains Content-Encoding: gzip, with text/html Content-Type.

Is that really IE7 or is it Firefox with IE7 User-Agent?
I had never seen a 'X-moz: prefetch' header from IE. Some people report that Google Web Accelerator also send it. Are you running it?
Maybe the Google toolbar is adding 'accelerating' features?

Comment 22 Johann H. Addicks 2008-10-16 15:09:04 UTC

It is really IE7. 
on the machine there google-toolbar (for IE) and google desktop installed. 

How do i find out if "google web accelerator" is installed? How do i turn it off?

And even if so: Since mediawiki (wikipedia, openstreetmap and ubuntu-wiki) are the only websites ones i encounter the problem, it is something which may be a failure of Microsoft and/or Google, but mediawiki (with our without aggressivly tuned source-squids) seems to trigger this effect.

Comment 23 Platonides 2008-10-16 16:01:23 UTC

While i'd like to blame Microsoft or Google, there's something wrong on wikimedia side, as there have been reports on several browsers and different variants (there may be several bugs looking the same).
However, maybe they make it more likely. 

If you don't know about google web accelerator, you probably don't have it. Google web accelerator is like a proxy, so would have shown on the captures
You can try disabling (or uninstalling) Google Toolbar and check if it goes away. Also by doing a direct connection, instead of using the squid. It's happening to you in an high rate, which is good, because it allows the developer to obtain more data, check if the proposed solution works...

Comment 24 Rez 2008-10-16 17:27:25 UTC

I have absolutely NO toolbars or add-ons installed in my old Netscape, where the problem was first observed. Just buck-naked Netscape. No popup blockers or similar utils installed on the system, either.

I'm wondering if it's a variant of the "Document contains no data" bug, which I had cause to research a few years back, and learned that it's actually a server bug which is triggered by a deficiency in the browser -- essentially, it's failure to notice that the browser can't accept compression. Sound familiar? :)  (Novell issued a patch to address this problem in one of the early internet-enabled versions of Netware.)

Comment 25 Brion Vibber 2009-03-09 23:25:17 UTC


*** This bug has been marked as a duplicate of bug 7098 ***

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links