Last modified: 2013-03-20 09:15:41 UTC
+++ This bug was initially created as a clone of Bug #41130 comment 87 +++ Splitting off from bug 41130 comment 87 and 88. It appears that the htcp purges are not clearing the upload squids in esams. It appeared to briefly be working, but is broken again. Furthermore, it appears that non-image related htcp purges are working fine (So articles are getting purged properly). I did the equivalent procedure described below but with an article, and everything worked fine. Examples of broken files include http://upload.wikimedia.org/wikipedia/commons/thumb/c/c2/Wappen_Landkreis_Aurich.svg/140px-Wappen_Landkreis_Aurich.svg.png ---- Steps to reproduce: 1) Find a file that is in the caches (note, its ok if the correct version is in the cache). In this example lets take http://upload.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg (I'm using the svg source, to ensure that this bug is not possibly mixed up with bug 44428) 2) Get the file, accessing from both esams and eqiad. Note the http headers, specifically the age header. To test from esams I used the command: wget -S -U bawolff --header 'host: upload.wikimedia.org' 'http://upload-lb.esams.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg' (Occasionally a varnish responded to that, which didn't have the described problem, If that happens just make the request again). This had an age header like: Age: 753820 Which is ~8 days To test from eqiad I used the command (Actually, I originally just used upload.wikimedia.org since I'm in north america, but I'm trying to make the steps to reproduce generic): wget -S -U bawolff --header 'host: upload.wikimedia.org' 'http://upload-lb.eqiad.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg' It had an age header of Age: 11199 3) Purge the image description page: http://commons.wikimedia.org/wiki/File:Wappen_Landkreis_Aurich.svg?action=purge Expected behaviour: Doing wget -S -U bawolff --header 'host: upload.wikimedia.org' 'http://upload-lb.esams.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg' and wget -S -U bawolff --header 'host: upload.wikimedia.org' 'http://upload-lb.eqiad.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg' Should both result in responses that either have cache miss headers, or an age header that is very small. Actual behavior: North america ( wget -S -U bawolff --header 'host: upload.wikimedia.org' 'http://upload-lb.eqiad.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg' ) gets the expected response: [..] Age: 0 X-Cache: cp1030 miss (0), cp1029 frontend miss (0) Europe ( wget -S -U bawolff --header 'host: upload.wikimedia.org' 'http://upload-lb.esams.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg' ) does not have the cache cleared [..] Age: 754915 X-Cache: HIT from sq84.wikimedia.org X-Cache: MISS from amssq62.esams.wikimedia.org X-Cache: MISS from amssq61.esams.wikimedia.org ----------------- In previous tests I did, things worked yesterday, and there have been some users reporting success here and there, which suggests this issue is either intermittent or recently broken. However, I did try this procedure several times, so its not just that my packet happened to be lost (Unless a significant portion of the htcp packets are being lost). Several users have reported that broken thumbs are magically fixed one out of every so often page loads. I think that has to do with sometimes a varnish server responds from esams, and it doesn't appear to have the cache clearing issue (the headers say in that case that the cache hit is from cp1030, which also seems to respond from eqiad, so I'm guessing that server is actually in eqiad(?). This is going quite a bit beyond my knowladge of wmf's network layout).
bawolff: Thanks for the great initial analysis, and isolating it from the noise in bug 41130. I really really appreciate this. For the records, I've also copied this to RT# 4454 as told on #operations IRC.
Hi folks, Leslie is looking into the problem now. Brian, thanks for the detailed analysis (and for splitting this out)!
I should never have let people know I exist ;) multicast forwarding is working between dobson and hooft and a lot of purges are being sent -- therefore the problem is either with upload varnishes not sending or with the purging in esams itself. I'm checking these as the next two steps
cp3009 had varnishhtcpd locked up - restarted
the varnish upload frontends are properly purging cache now - the squid frontends are not. investigating this
the eqiad mediawiki boxes did not have permissions to send purges to the tampa and esams squids, and therefore those purges were not happening. this was semi obscured by the varnish issues. But now, all should be complete. Please doulbe check and let me know. Also thank you very much for the very detailed steps for reproduction. These kinds of well written bugs are very very helpful :) Leslie
The thumb and original in the example are definitely fixed. Wonderful!
I just tested Moscow_metro_map_ru_sb_future.svg doing both the svg and the png thumbnail. The caches appear to be purged all around. Hip, Hip Hurrah! Thank you for the quick response.
423px thumbnail for example http://commons.wikimedia.org/wiki/File:Wappen_Landkreis_Aurich.svg also constantly shows the new version now here in Europe. Awesome. Thanks so much to Leslie and bawolff! I'm closing this as RESOLVED FIXED.
Problem seems to persist for https://upload.wikimedia.org/wikipedia/commons/1/18/Kochendes_wasser02.jpg . At least for me
(In reply to comment #10) > Problem seems to persist for > https://upload.wikimedia.org/wikipedia/commons/1/18/Kochendes_wasser02.jpg . > At > least for me Same here, but I purged it a few times and it seems ok now, doesn't it?
(In reply to comment #10) > Problem seems to persist for > https://upload.wikimedia.org/wikipedia/commons/1/18/Kochendes_wasser02.jpg Works for me now (probably thanks to Nemo).
New instance: https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/PlacaenmonumentoaMarconi.jpg/120px-PlacaenmonumentoaMarconi.jpg "Broken" response from esams: HTTP request sent, awaiting response... HTTP/1.1 200 OK Server: nginx/1.1.19 Date: Tue, 19 Mar 2013 23:35:35 GMT Content-Type: image/jpeg Content-Length: 3637 Connection: keep-alive X-Object-Meta-Sha1base36: q311dfj910y0loxw51q2fm79rw8j32o Content-Disposition: inline;filename*=UTF-8''PlacaenmonumentoaMarconi.jpg X-Timestamp: 1363609261.89105 Accept-Ranges: bytes Access-Control-Allow-Origin: * Last-Modified: Mon, 18 Mar 2013 12:21:01 GMT ETag: b737ec5a44b9f7975b69facfde655da0 X-Cache: HIT from sq82.wikimedia.org X-Cache-Lookup: HIT from sq82.wikimedia.org:3128 Age: 41471 X-Cache: HIT from amssq59.esams.wikimedia.org X-Cache-Lookup: HIT from amssq59.esams.wikimedia.org:3128 X-Cache: MISS from knsq19.esams.wikimedia.org X-Cache-Lookup: MISS from knsq19.esams.wikimedia.org:80 Via: 1.0 sq82.wikimedia.org:3128 (squid/2.7.STABLE9), 1.0 amssq59.esams.wikimedia.org:3128 (squid/2.7.STABLE9), 1.0 knsq19.esams.wikimedia.org:80 (squid/2.7.STABLE9) Length: 3637 (3.6K) [image/jpeg] "Valid" response via Varnish: HTTP request sent, awaiting response... HTTP/1.1 200 OK Server: nginx/1.1.19 Date: Tue, 19 Mar 2013 23:35:24 GMT Content-Type: image/jpeg Content-Length: 3994 Connection: keep-alive X-Object-Meta-Sha1base36: qtiacup2waiwupgx7sma7puadwpqeig Content-Disposition: inline;filename*=UTF-8''PlacaenmonumentoaMarconi.jpg Last-Modified: Tue, 19 Mar 2013 23:24:09 GMT Etag: 92f154cb562f20cf648199f1bd88c464 X-Timestamp: 1363735449.88825 X-Varnish: 2316312654, 2904564637, 670145209 Via: 1.1 varnish, 1.1 varnish, 1.1 varnish Accept-Ranges: bytes Age: 0 X-Cache: cp1028 miss (0), cp3009 miss (0), cp3009 frontend miss (0) Access-Control-Allow-Origin: * Length: 3994 (3.9K) [image/jpeg] Please confirm. (I already purged the file page)
Marco: Thanks for reporting this and the initial analysis. As this bug report was closed about six weeks ago it might be a different technical issue this time, so I've moved the last comment to a separate bug report: bug 46350.