Last modified: 2014-03-01 21:26:02 UTC
The size listed on the Web is 4.9Gb. The actual downloaded file size is 899 MB. In WinXP/IE6, the file can be downloaded for 899MB. The window showing download progress states that the target size is 899MB. In Vista/IE6, the file cannot be downloaded. It complains about HTTP header error. In Vista/IE7, the file can be downloaded for 899MB. The window showing download progress states that the target size is 4.9GB. wget downloads, again, a file of 899MB. 899MB is too small for an English Wikipedia article dump. Previous dump (July 2008) was around 3.8GB.
(In reply to comment #0) > Previous dump (July 2008) was around 3.8GB. The previous dump was the week before, and there have been 5 since dumps were started again in May, e.g. http://download.wikimedia.org/enwiki/20090610/ Are all of these affected? I don't have the bandwidth to find out.
I obtained a dump successfully about a year ago. I tried all the dumps currently available at http://download.wikimedia.org/enwiki and they had the same problem.
Note that you may have an old version of wget which was known to have problems with files over 4GB. Have never attempted large files with IE, but recent versions presumably should work if on an NTFS filesystem. (If you're downloading to a FAT32 filesystem like many default USB drives it will likely fail, but I think it should fail differently -- reporting an error at 2gb or 4gb rather than cropping off to the 0.9GB.) Another possibility is that you're accessing the internet through a proxy which fails to understand large files properly. This might explain the Content-Length header being passed on (so you get correct report of the 4.9GB to come) but the intermediary crapping out at the 0.9GB 32-bit-wrapped limit. Tomasz, putting this one on your bench; be good to double-check we haven't broken the server or something ;) but afaik it should be serving out fine.
Did a quick verify on OSX 10.5 using wget 1.11.4 and everything is showing up just like it should. 4.9GB /opt/local/bin/wget -S http://download.wikimedia.org/enwiki/20090610/enwiki-20090610-pages-articles.xml.bz2 ...... HTTP request sent, awaiting response... HTTP/1.0 200 OK Connection: keep-alive Content-Type: application/octet-stream Accept-Ranges: bytes Content-Length: 5227630350 Date: Tue, 23 Jun 2009 02:24:22 GMT Server: lighttpd/1.4.19 Length: 5227630350 (4.9G) [application/octet-stream] Looking at http://tinyurl.com/ozafl2 shows the same correct content length header being returned if the user agent is IE. It also correctly downloads past 899MB from my personal server that is not running in the wikimedia cluster. I'll try this with IE after re-installing windows in the next day or so just to make sure its not an issue with the browser but otherwise I'm really suspecting a 32bit proxy here. Gene, could you post in the content length you seeing on the wget downloads by adding a "-S" ? It would also be nice to know if you going through a 32bit proxy as Brion says. That could easily do it.
I guess the problem is related to the proxy server. I switched to a diffrent proxy server and a download attempt via IE stopped at 918MB. I obtained the wget 1.11.4 and ran it with the -S option. There was a hiccup at around 900MB -- the connection got closed. Fortunately wget is robust enough this time to get reconnected. So far it is still running smoothly with ~3GB downloaded: SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc syswgetrc = c:\Program Files\GnuWin32/etc/wgetrc --2009-06-23 10:21:24-- http://download.wikimedia.org/enwiki/20090618/enwiki-20090618-pages-articles.xml.bz2 Resolving download.wikimedia.org... 208.80.152.183 Connecting to download.wikimedia.org|208.80.152.183|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 200 OK Connection: Keep-Alive Proxy-Connection: Keep-Alive Content-Length: 5258589574 Date: Tue, 23 Jun 2009 17:21:25 GMT Content-Type: application/octet-stream Server: lighttpd/1.4.19 Accept-Ranges: bytes Length: 5258589574 (4.9G) [application/octet-stream] Saving to: `enwiki-20090618-pages-articles.xml.bz2' 18% [=================> ] 963,624,164 763K/s in 21m 9s 2009-06-23 10:42:34 (741 KB/s) - Connection closed at byte 963624164. Retrying. --2009-06-23 10:42:35-- (try: 2) http://download.wikimedia.org/enwiki/20090618/enwiki-20090618-pages-articles.xml.bz2 Connecting to download.wikimedia.org|208.80.152.183|:80... connected. HTTP request sent, awaiting response... HTTP/1.1 206 Partial Content Connection: close Proxy-Connection: close Content-Length: 4294965410 Date: Tue, 23 Jun 2009 17:42:36 GMT Content-Range: bytes 963624164-5258589573/5258589574 Content-Type: application/octet-stream Server: lighttpd/1.4.19 Accept-Ranges: bytes Length: 5258589574 (4.9G), 4294965410 (4.0G) remaining [application/octet-stream] Saving to: `enwiki-20090618-pages-articles.xml.bz2' 58% [+++++++++++++++++======================================> ] 3,086,259,252 773K/s eta 46m 20s Thanks for all of you who have helped!
No problem. Let us know if anything else pops up.
moving product dbzip2 to product Wikimedia tools