Last modified: 2013-03-20 09:15:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T46508, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 44508 - htcp cache purges for images do not seem to clear europe upload squid caches
htcp cache purges for images do not seem to clear europe upload squid caches
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
wmf-deployment
All All
: Highest critical (vote)
: ---
Assigned To: Leslie Carr
: ops, platformeng
Depends on: 41130
Blocks: 41371
  Show dependency treegraph
 
Reported: 2013-01-30 16:22 UTC by Bawolff (Brian Wolff)
Modified: 2013-03-20 09:15 UTC (History)
26 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Bawolff (Brian Wolff) 2013-01-30 16:22:26 UTC
+++ This bug was initially created as a clone of Bug #41130 comment 87 +++

Splitting off from bug 41130 comment 87 and 88. 

It appears that the htcp purges are not clearing the upload squids in esams. It appeared to briefly be working, but is broken again.

Furthermore, it appears that non-image related htcp purges are working fine (So articles are getting purged properly). I did the equivalent procedure described below but with an article, and everything worked fine.

Examples of broken files include http://upload.wikimedia.org/wikipedia/commons/thumb/c/c2/Wappen_Landkreis_Aurich.svg/140px-Wappen_Landkreis_Aurich.svg.png

----

Steps to reproduce:

1) Find a file that is in the caches (note, its ok if the correct version is in the cache). In this example lets take http://upload.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg (I'm using the svg source, to ensure that this bug is not possibly mixed up with bug 44428)

2) Get the file, accessing from both esams and eqiad. Note the http headers, specifically the age header.

To test from esams I used the command:
 wget -S -U bawolff  --header 'host: upload.wikimedia.org' 'http://upload-lb.esams.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg'

(Occasionally a varnish responded to that, which didn't have the described problem, If that happens just make the request again).

This had an age header like:
 Age: 753820
Which is ~8 days

To test from eqiad I used the command (Actually, I originally just used upload.wikimedia.org since I'm in north america, but I'm trying to make the steps to reproduce generic):
 wget -S -U bawolff  --header 'host: upload.wikimedia.org' 'http://upload-lb.eqiad.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg'

It had an age header of 
 Age: 11199

3) Purge the image description page: http://commons.wikimedia.org/wiki/File:Wappen_Landkreis_Aurich.svg?action=purge

Expected behaviour:

Doing 
  wget -S -U bawolff  --header 'host: upload.wikimedia.org' 'http://upload-lb.esams.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg'
and
  wget -S -U bawolff  --header 'host: upload.wikimedia.org' 'http://upload-lb.eqiad.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg'

Should both result in responses that either have cache miss headers, or an age header that is very small.

Actual behavior:

North america ( wget -S -U bawolff  --header 'host: upload.wikimedia.org' 'http://upload-lb.eqiad.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg' ) gets the expected response:
  [..]
  Age: 0
  X-Cache: cp1030 miss (0), cp1029 frontend miss (0)

Europe ( wget -S -U bawolff  --header 'host: upload.wikimedia.org' 'http://upload-lb.esams.wikimedia.org/wikipedia/commons/c/c2/Wappen_Landkreis_Aurich.svg' ) does not have the cache cleared
  [..]
  Age: 754915
  X-Cache: HIT from sq84.wikimedia.org
  X-Cache: MISS from amssq62.esams.wikimedia.org
  X-Cache: MISS from amssq61.esams.wikimedia.org


-----------------
  
In previous tests I did, things worked yesterday, and there have been some users reporting success here and there, which suggests this issue is either intermittent or recently broken. However, I did try this procedure several times, so its not just that my packet happened to be lost (Unless a significant portion of the htcp packets are being lost).

Several users have reported that broken thumbs are magically fixed one out of every so often page loads. I think that has to do with sometimes a varnish server responds from esams, and it doesn't appear to have the cache clearing issue (the headers say in that case that the cache hit is from cp1030, which also seems to respond from eqiad, so I'm guessing that server is actually in eqiad(?). This is going quite a bit beyond my knowladge of wmf's network layout).
Comment 1 Andre Klapper 2013-01-30 20:24:19 UTC
bawolff: Thanks for the great initial analysis, and isolating it from the noise in bug 41130. I really really appreciate this. 

For the records, I've also copied this to RT# 4454 as told on #operations IRC.
Comment 2 Rob Lanphier 2013-01-30 21:33:47 UTC
Hi folks, Leslie is looking into the problem now.  Brian, thanks for the detailed analysis (and for splitting this out)!
Comment 3 Leslie Carr 2013-01-30 21:39:09 UTC
I should never have let people know I exist ;)

multicast forwarding is working between dobson and hooft and a lot of purges are being sent -- therefore the problem is either with upload varnishes not sending or with the purging in esams itself.  I'm checking these as the next two steps
Comment 4 Leslie Carr 2013-01-30 21:58:17 UTC
cp3009 had varnishhtcpd locked up - restarted
Comment 5 Leslie Carr 2013-01-30 22:28:26 UTC
the varnish upload frontends are properly purging cache now - the squid frontends are not. investigating this
Comment 6 Leslie Carr 2013-01-30 22:48:37 UTC
the eqiad mediawiki boxes did not have permissions to send purges to the tampa and esams squids, and therefore those purges were not happening. this was semi obscured by the varnish issues.  But now, all should be complete.  Please doulbe check and let me know. 


Also thank you very much for the very detailed steps for reproduction.  These kinds of well written bugs are very very helpful :)

Leslie
Comment 7 Nemo 2013-01-30 22:52:17 UTC
The thumb and original in the example are definitely fixed. Wonderful!
Comment 8 Bawolff (Brian Wolff) 2013-01-30 22:59:34 UTC
I just tested Moscow_metro_map_ru_sb_future.svg doing both the svg and the png thumbnail. The caches appear to be purged all around.

Hip, Hip Hurrah! Thank you for the quick response.
Comment 9 Andre Klapper 2013-01-31 10:51:00 UTC
423px thumbnail for example http://commons.wikimedia.org/wiki/File:Wappen_Landkreis_Aurich.svg also constantly shows the new version now here in Europe.

Awesome. Thanks so much to Leslie and bawolff!

I'm closing this as RESOLVED FIXED.
Comment 10 Marco 2013-02-02 18:16:02 UTC
Problem seems to persist for https://upload.wikimedia.org/wikipedia/commons/1/18/Kochendes_wasser02.jpg . At least for me
Comment 11 Nemo 2013-02-02 18:33:36 UTC
(In reply to comment #10)
> Problem seems to persist for
> https://upload.wikimedia.org/wikipedia/commons/1/18/Kochendes_wasser02.jpg .
> At
> least for me

Same here, but I purged it a few times and it seems ok now, doesn't it?
Comment 12 Andre Klapper 2013-02-04 16:21:51 UTC
(In reply to comment #10)
> Problem seems to persist for
> https://upload.wikimedia.org/wikipedia/commons/1/18/Kochendes_wasser02.jpg

Works for me now (probably thanks to Nemo).
Comment 13 Marco 2013-03-19 23:54:38 UTC
New instance: https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/PlacaenmonumentoaMarconi.jpg/120px-PlacaenmonumentoaMarconi.jpg

"Broken" response from esams:

HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Server: nginx/1.1.19
  Date: Tue, 19 Mar 2013 23:35:35 GMT
  Content-Type: image/jpeg
  Content-Length: 3637
  Connection: keep-alive
  X-Object-Meta-Sha1base36: q311dfj910y0loxw51q2fm79rw8j32o
  Content-Disposition: inline;filename*=UTF-8''PlacaenmonumentoaMarconi.jpg
  X-Timestamp: 1363609261.89105
  Accept-Ranges: bytes
  Access-Control-Allow-Origin: *
  Last-Modified: Mon, 18 Mar 2013 12:21:01 GMT
  ETag: b737ec5a44b9f7975b69facfde655da0
  X-Cache: HIT from sq82.wikimedia.org
  X-Cache-Lookup: HIT from sq82.wikimedia.org:3128
  Age: 41471
  X-Cache: HIT from amssq59.esams.wikimedia.org
  X-Cache-Lookup: HIT from amssq59.esams.wikimedia.org:3128
  X-Cache: MISS from knsq19.esams.wikimedia.org
  X-Cache-Lookup: MISS from knsq19.esams.wikimedia.org:80
  Via: 1.0 sq82.wikimedia.org:3128 (squid/2.7.STABLE9), 1.0 amssq59.esams.wikimedia.org:3128 (squid/2.7.STABLE9), 1.0 knsq19.esams.wikimedia.org:80 (squid/2.7.STABLE9)
Length: 3637 (3.6K) [image/jpeg]

"Valid" response via Varnish:

HTTP request sent, awaiting response... 
  HTTP/1.1 200 OK
  Server: nginx/1.1.19
  Date: Tue, 19 Mar 2013 23:35:24 GMT
  Content-Type: image/jpeg
  Content-Length: 3994
  Connection: keep-alive
  X-Object-Meta-Sha1base36: qtiacup2waiwupgx7sma7puadwpqeig
  Content-Disposition: inline;filename*=UTF-8''PlacaenmonumentoaMarconi.jpg
  Last-Modified: Tue, 19 Mar 2013 23:24:09 GMT
  Etag: 92f154cb562f20cf648199f1bd88c464
  X-Timestamp: 1363735449.88825
  X-Varnish: 2316312654, 2904564637, 670145209
  Via: 1.1 varnish, 1.1 varnish, 1.1 varnish
  Accept-Ranges: bytes
  Age: 0
  X-Cache: cp1028 miss (0), cp3009 miss (0), cp3009 frontend miss (0)
  Access-Control-Allow-Origin: *
Length: 3994 (3.9K) [image/jpeg]


Please confirm. (I already purged the file page)
Comment 14 Andre Klapper 2013-03-20 09:15:41 UTC
Marco: Thanks for reporting this and the initial analysis. As this bug report was closed about six weeks ago it might be a different technical issue this time, so I've moved the last comment to a separate bug report: bug 46350.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links