Last modified: 2007-05-30 19:20:36 UTC
Punctuation in an image name on wikipedia causes numerous problems. Known to be a problem with ampersand and plus, also maybe dashes - presumably all punctuation. The thumbnails generated from the image will show up briefly then disappear and cannot be re- generated. Causes other problems like the "image" tab showing up as red (doesn't exist). Also if the image is updated, new thumbnails will not be generated to replace the existing ones even after a purge. This appears to be a recent bug as older images don't suffer from this.
When testing (after seeing a comment on [[WP:VPT]]), I found some bizarre cache behaviour on the broken thumbnails. First, two working ones, for comparison purposes: $ telnet localhost 3128 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. HEAD http://upload.wikimedia.org/wikipedia/commons/thumb/8/81/USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg/800px-USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg HTTP/1.0 Host: upload.wikimedia.org HTTP/1.0 200 OK X-Powered-By: PHP/5.1.4 Content-Type: image/jpeg Date: Sun, 24 Dec 2006 23:14:22 GMT Server: lighttpd/1.4.13 X-Cache: MISS from sq7.wikimedia.org X-Cache-Lookup: MISS from sq7.wikimedia.org:80 X-Cache: MISS from delta.home.cesarb.net X-Cache-Lookup: MISS from delta.home.cesarb.net:3128 Proxy-Connection: close Connection closed by foreign host. $ telnet localhost 3128 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. HEAD http://upload.wikimedia.org/wikipedia/commons/thumb/8/81/USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg/741px-USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg HTTP/1.0 Host: upload.wikimedia.org HTTP/1.0 200 OK X-Powered-By: PHP/5.1.4 Content-Type: image/jpeg Date: Sun, 24 Dec 2006 23:16:29 GMT Server: lighttpd/1.4.13 X-Cache: MISS from sq14.wikimedia.org X-Cache-Lookup: MISS from sq14.wikimedia.org:80 X-Cache: MISS from delta.home.cesarb.net X-Cache-Lookup: MISS from delta.home.cesarb.net:3128 Proxy-Connection: close Connection closed by foreign host. And then, two broken ones: $ telnet localhost 3128 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. HEAD http://upload.wikimedia.org/wikipedia/commons/thumb/8/81/USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg/740px-USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg HTTP/1.0 Host: upload.wikimedia.org HTTP/1.0 200 OK Content-Type: image/jpeg ETag: "6463846857823763183" Accept-Ranges: bytes Last-Modified: Sat, 21 Oct 2006 23:58:58 GMT Content-Length: 260 Date: Sun, 24 Dec 2006 22:45:37 GMT Server: lighttpd/1.4.13 X-Cache: HIT from sq9.wikimedia.org X-Cache-Lookup: HIT from sq9.wikimedia.org:80 X-Cache: HIT from sq5.wikimedia.org X-Cache-Lookup: HIT from sq5.wikimedia.org:80 X-Cache: HIT from sq10.wikimedia.org X-Cache-Lookup: HIT from sq10.wikimedia.org:80 Age: 1755 X-Cache: HIT from delta.home.cesarb.net X-Cache-Lookup: HIT from delta.home.cesarb.net:3128 Proxy-Connection: close Connection closed by foreign host. $ telnet localhost 3128 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. GET http://upload.wikimedia.org/wikipedia/commons/thumb/8/81/USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg/250px-USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg HTTP/1.0 Host: upload.wikimedia.org HTTP/1.0 200 OK Content-Type: image/jpeg ETag: "-3342178414438865210" Accept-Ranges: bytes Last-Modified: Fri, 13 Oct 2006 16:54:09 GMT Content-Length: 260 Date: Mon, 20 Nov 2006 18:44:45 GMT Server: lighttpd/1.4.13 X-Cache: HIT from sq2.pmtpa.wmnet X-Cache-Lookup: HIT from sq2.pmtpa.wmnet:80 X-Cache: HIT from sq8.pmtpa.wmnet X-Cache-Lookup: HIT from sq8.pmtpa.wmnet:80 X-Cache: HIT from sq4.pmtpa.wmnet X-Cache-Lookup: HIT from sq4.pmtpa.wmnet:80 X-Cache: HIT from sq14.wikimedia.org X-Cache-Lookup: HIT from sq14.wikimedia.org:80 X-Cache: HIT from sq13.wikimedia.org X-Cache-Lookup: HIT from sq13.wikimedia.org:80 X-Cache: HIT from sq4.wikimedia.org X-Cache-Lookup: HIT from sq4.wikimedia.org:80 X-Cache: HIT from sq7.wikimedia.org X-Cache-Lookup: HIT from sq7.wikimedia.org:80 X-Cache: HIT from sq1.wikimedia.org X-Cache-Lookup: HIT from sq1.wikimedia.org:80 X-Cache: HIT from sq4.wikimedia.org X-Cache-Lookup: HIT from sq4.wikimedia.org:80 X-Cache: HIT from sq2.wikimedia.org X-Cache-Lookup: HIT from sq2.wikimedia.org:80 X-Cache: HIT from sq5.wikimedia.org X-Cache-Lookup: HIT from sq5.wikimedia.org:80 X-Cache: HIT from sq12.wikimedia.org X-Cache-Lookup: HIT from sq12.wikimedia.org:80 X-Cache: HIT from sq2.wikimedia.org X-Cache-Lookup: HIT from sq2.wikimedia.org:80 X-Cache: HIT from sq8.wikimedia.org X-Cache-Lookup: HIT from sq8.wikimedia.org:80 X-Cache: HIT from sq10.wikimedia.org X-Cache-Lookup: HIT from sq10.wikimedia.org:80 X-Cache: HIT from sq14.wikimedia.org X-Cache-Lookup: HIT from sq14.wikimedia.org:80 X-Cache: HIT from sq2.wikimedia.org X-Cache-Lookup: HIT from sq2.wikimedia.org:80 X-Cache: HIT from sq4.wikimedia.org X-Cache-Lookup: HIT from sq4.wikimedia.org:80 X-Cache: HIT from sq10.wikimedia.org X-Cache-Lookup: HIT from sq10.wikimedia.org:80 X-Cache: HIT from delta.home.cesarb.net X-Cache-Lookup: HIT from delta.home.cesarb.net:3128 Proxy-Connection: close <html><head> <title>Bad title</title> <body> <h1>Bad title</h1> <p>The requested page title was invalid, empty, or an incorrectly linked inter-language or inter-wiki title. It may contain one more characters which cannot be used in titles.</p> </body></html>Connection closed by foreign host. The X-Cache header lines seem positively strange. Also, unless I'm reading the code wrong, the Cache-Control and Content-Type headers which should be there (from thumb.php) are missing.
Following a suggestion at [[WP:VPT#149px works but 150px doesn't]], I added a 1 at the end of one of the URLs (http://upload.wikimedia.org/wikipedia/commons/thumb/8/81/USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg/250px-USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg1) and it regenerated one of the broken thumbnails.
The image tab is red because there's no local page on en.wikipedia.org; the image is hosted on Commons. Am investigating the thumb issue; there seem to be some uggy things with the thumbnail caching system.
I've adjusted the caching script to, hopefully, work correctly now for this case. Can you confirm correct behavior now?
After purging a couple of times on commons, the 740px thumbnail is still in the broken state: $ telnet upload.wikimedia.org 80 Trying 66.230.200.228... Connected to upload.pmtpa.wikimedia.org. Escape character is '^]'. GET /wikipedia/commons/thumb/8/81/USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg/740px-USS_John_C._Stennis_%28CVN-74%29_%26_HMS_Illustrious_%28R_06%29.jpg HTTP/1.0 Host: upload.wikimedia.org HTTP/1.0 200 OK Content-Type: image/jpeg ETag: "6463846857823763183" Accept-Ranges: bytes Last-Modified: Sat, 21 Oct 2006 23:58:58 GMT Content-Length: 260 Date: Sun, 24 Dec 2006 22:45:37 GMT Server: lighttpd/1.4.13 X-Cache: HIT from sq9.wikimedia.org X-Cache-Lookup: HIT from sq9.wikimedia.org:80 X-Cache: HIT from sq5.wikimedia.org X-Cache-Lookup: HIT from sq5.wikimedia.org:80 Age: 61892 X-Cache: HIT from sq10.wikimedia.org X-Cache-Lookup: HIT from sq10.wikimedia.org:80 X-Cache: MISS from sq9.wikimedia.org X-Cache-Lookup: MISS from sq9.wikimedia.org:80 Via: 1.0 sq9.wikimedia.org:80 (squid/2.6.STABLE5), 1.0 sq5.wikimedia.org:80 (squid/2.6.STABLE5), 1.0 sq10.wikimedia.org:80 (squid/2.6.STABLE5), 1.0 sq9.wikimedia.org:80 (squid/2.6.STABLE5) Connection: close <html><head> <title>Bad title</title> <body> <h1>Bad title</h1> <p>The requested page title was invalid, empty, or an incorrectly linked inter-language or inter-wiki title. It may contain one more characters which cannot be used in titles.</p> </body></html>Connection closed by foreign host. The problem might be fixed, but purging doesn't seem to be enough to clear the bogus entries. Unfortunately, I don't know how to check if the problem which created these bogus entries in the first place has really been fixed. The crazy "add 1 to the end" trick would probably fix this one too, but I'll leave it broken to help debugging.
I cleared the old bad files for this particular entry manually, and it now shows properly for me. Purge doesn't clear them, and it's not saving new, correct copies on the caching thumbnail server; presumably fetching from the main server. Sigh.
Wouldn't it just be easier to disallow new uploads which contain bad characters?
We don't want to prohibit punctuation in image names. This is a bug in the caching system that should be fixed there, as I understand it.
Are there still any current problems? Brion said in December that he fixed it and there wasn't any update since.
No problems currently that I'm aware of, and various changes have been made in infrastructure. If problems continue, please provide details.