Last modified: 2013-08-22 14:54:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T43451, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 41451 - ULS causes pages to be cached with random user language
ULS causes pages to be cached with random user language
Status: VERIFIED FIXED
Product: MediaWiki extensions
Classification: Unclassified
UniversalLanguageSelector (Other open bugs)
master
All All
: High normal (vote)
: ---
Assigned To: Nobody - You can work on this!
: i18n, ops
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-27 08:53 UTC by Aude
Modified: 2013-08-22 14:54 UTC (History)
29 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Patch for Squid to add cookie headers to output sent to storeurl_rewrite_program (978 bytes, patch)
2012-11-13 23:55 UTC, Ori Livneh
Details
storeurl_rewrite_program in Python that adds value of ULS cookie as query param to URL. (1.02 KB, text/plain)
2012-11-13 23:56 UTC, Ori Livneh
Details
Patch for Squid to add cookie headers to output sent to storeurl_rewrite_program (979 bytes, patch)
2012-11-14 00:02 UTC, Ori Livneh
Details

Description Aude 2012-10-27 08:53:23 UTC
When I visit wikidata.org (with ULS) for the first time (e.g. in a new browser, no jstorage / cookie stuff), it shows me the sidebar etc. in Dutch.

When I click "Help" or something, it then switches to English for me.

Lydia (from wikidata) also had this issue.

We are in Germany so I wonder if it's picking up the caching server ESAMS IP address and using that somehow to guess our language?
Comment 1 Aude 2012-10-27 08:55:26 UTC
This happened for me with Firefox, and not with Chrome.  

I'm normally using Chrome and probably already have cookies and local storage stuff set on my browser, but it was a fresh visit to wikidata.org with Firefox.
Comment 2 Niklas Laxström 2012-10-27 09:10:33 UTC
Caching has not been configured properly to vary by language. ULS is functioning correctly.
Comment 3 Aude 2012-10-27 09:12:18 UTC
okay, that makes sense.  I'm not sure what bugzilla component to pick but if you want to change it, that's okay with me.
Comment 4 Aude 2012-10-27 09:33:44 UTC
clicking around, i got one page in Norwegian and community portal in icelandic.

and it's showing "common languages"

    English
    norsk (bokmål)
    íslenska

and some missing system messages like uls-search-placeholder 

looks like some more setting up is needed of wikidata and configuring stuff.
Comment 5 Aude 2012-10-27 09:37:15 UTC
and spanish (and icelandic) now in chrome.
Comment 6 jeblad 2012-10-27 11:59:30 UTC
Learn Norwegian! Problem solved.
Comment 7 Aude 2012-10-27 21:15:44 UTC
Reedy removed $wgULSGeoService = false to see if that mattered.  It seems to improve things but not completely fix.

Special:Recentchanges seems to be especially unsticky (the setlang param) in both Firefox and Chrome.

In Chrome, I switched my lang to Arabic, clicked around, then back to English.  The recent changes page is stuck in Arabic.

In Firefox, Recent changes is still stuck in Norwegian but the other pages seem to be behaving better.  

I have tried proxying my browsers via the US, bypassing ESAMS, and disabling browser cache, deleting cookies, local storage, etc.
Comment 8 jeblad 2012-10-27 23:48:05 UTC
Does ULS store information in the browsers local storage or session storage?
Comment 9 Niklas Laxström 2012-11-01 14:01:41 UTC
The language selection is done on PHP side, but Wikidata currently has normal caching which does not vary per accept language, and thus the pages are served in the language they were cached in.
Comment 10 Siebrand Mazeland 2012-11-01 14:42:00 UTC
This issue should be taken up with ops. There is no issue with component ULS. See comment 9.
Comment 11 Mark Bergsma 2012-11-01 14:56:04 UTC
Squid simply follows the instructions it's getting from MediaWiki in that respect. Send Vary / X-Vary-Options headers as appropriate, and then it should work.
Comment 12 Siebrand Mazeland 2012-11-01 17:04:42 UTC
(In reply to comment #11)
> Squid simply follows the instructions it's getting from MediaWiki in that
> respect. Send Vary / X-Vary-Options headers as appropriate, and then it should
> work.

How should this be implemented then so that anonymous users get served the UI that their accept language or the cookie that is set when the change the language away from the accept language? Code samples will probably suffice.
Comment 13 Rob Lanphier 2012-11-06 00:06:00 UTC
Setting to high priority.  I think we need to figure this one out if we're going to see ULS deployed much more widely than it is now.  This is already causing problems on wikidata.org.
Comment 14 Daniel Kinzler 2012-11-06 07:41:18 UTC
(In reply to comment #11)
> Squid simply follows the instructions it's getting from MediaWiki in that
> respect. Send Vary / X-Vary-Options headers as appropriate, and then it should
> work.

After a brief look, seems to me that there is no mechanism in Squid that allows to split the cache based on the value of a specific cookie. Am I missing something obvious? I just don't see how Vary and X-Vary-Options can be used to do this. Maybe instead of varying on cookies, we should vary on an ETag? Anyway, here's what I found so far: 

If we use Vary: Cookie, we will vary on *all* cookies. That would fracture the cache beyond repair. So, maybe X-Vary-Options can help?

I dug up Tim's original mail containing the patch that introduces X-Vary-Options into Squid: <http://www.squid-cache.org/mail-archive/squid-dev/200802/0085.html>. There seems to be very li8ttle documentation beyond that. From what I see, the best we can do with that is something like: 

X-Vary-Options: Cookie; string-contains=language; 

But that would just split the cache into two: requests with the "language" cookie set, and those without the language cookie set. It does not very on the value of the cookie. Also - is it still true that X-Vary-Options is a custom patch, and never got into the Squids main line? if not, please fix <http://wiki.squid-cache.org/Features/XvaryOptions> :)

Anyway - what we would actually need is something like 

X-Vary-Cookie: language

...that would vary on the value of the language cookie. Or maybe X-Vary-Options could be extended to cover this:

X-Vary-Options: Cookie; string-extract=language=([^;]*); 
 
But this is messy, and I don't think anyone is up to patching Squid further. So... other options? Could we vary on the ETAg header, and generate it depending on the user language? 

PS: I'm also a bit surprised that this issue comes at a surprise. Allowing anons to set the interface language is requested about once a year, and this problem is the reason it got shot down time and time again. With regards to Wikidata, I discussed this problem with several staffers at the Berlin hackathon and at Wikimania. Essentially I was told "wait for ULS, they'll take care of that stuff". Hm.
Comment 15 Daniel Kinzler 2012-11-06 07:46:14 UTC
Hm, now that I submitted the above, it occurred to me that I perhaps *was* missing the obvious... how about this:

  Vary: Content-Language

Simple enough, no? At least at wikidata.org, Content-Language *should* be the user language. Or we simply introduce X-MW-User-Language and use Vary: X-MW-User-Language...

(A quick check shows that Content-Language is returning "en" on wikidata.org. I'll fix that).
Comment 16 Rob Lanphier 2012-11-06 07:48:28 UTC
(In reply to comment #12)
> How should this be implemented then so that anonymous users get served the UI
> that their accept language or the cookie that is set when the change the
> language away from the accept language? Code samples will probably suffice.


Hi Siebrand, are you asking for help with setting the Vary header generally, or are you asking about whether the Vary header can be set conditionally based on whether the cookie is set?  If you're asking about the former, I think OutputPage->addAcceptLanguage() and the GetCacheVaryCookies hook is where it's at.  If you're asking about the latter, that seems like another wrinkle in all of this; I'm not sure what the logic should look like; I think maybe the best thing from a caching perspective may be to set the cookie based on the Accept-Language header, and then only Vary on the cookie.

For limited deployments of ULS, varying the cache on ULS cookies and/or Accept-Language headers should be fine.  Where the concern is is if/when we deploy this widely over all wikis, varying all pages on all languages.  That could split the cache pretty badly.

I suppose we can fix this now for wikidata.org, and then worry about the larger cache issue once ULS is closer to going into wide deployment.
Comment 17 Daniel Kinzler 2012-11-06 07:58:00 UTC
Bug filed for setting Content-Language to the user language at least for item pages on wikidata: bug 41806
Comment 18 Daniel Kinzler 2012-11-06 08:03:22 UTC
(In reply to comment #16)
> For limited deployments of ULS, varying the cache on ULS cookies 

That sounds easy, but as I said above, I have not found a way to vary on the value of a specific cookie. I would have expected this to be a common use case, but apparently, it isn't.

> and/or Accept-Language headers should be fine.

That would be wrong. Which languages my browser is set to accept has nothing to do with which language I picked in ULS. Besides, Accept-Language headers exist in thousands of combinations of languages, priority values, whitespace, etc.

But this makes me wonder about something else... is Vary applied to the request or the response headers, or both? If it's just the request headers, things just got a lot harder, and my idea to use the Content-Language header doesn't work.
Comment 19 Daniel Kinzler 2012-11-06 08:30:51 UTC
To answer my own question: of course Squid can only vary on request headers. 

So, Content-Language and ETag are out of the question. Since Accept-Language also doesn't do what we need, we are back to square one: vary on the value of one specific cookie. Is squid really not able to do this? Looks like it would be hackish, but possible, with varnish: <https://www.varnish-cache.org/trac/wiki/VCLExampleRemovingSomeCookies>.

The only alternative I see is to actually use different URLs, injecting setlang parameters into all local links, like the StickToThatLanguage extension does. That's a nasty hack, but will work with squid caches - which is why we developed it that way.
Comment 20 Daniel Kinzler 2012-11-06 08:55:09 UTC
Oh, to add one more problem: purging. When a page changes, all variants (languages) of that page need to be purged. I don't think we currently have a mechanism for this at all.
Comment 21 Niklas Laxström 2012-11-06 11:01:50 UTC
(In reply to comment #18)
> > and/or Accept-Language headers should be fine.
> 
> That would be wrong. Which languages my browser is set to accept has nothing to
> do with which language I picked in ULS. Besides, Accept-Language headers exist
> in thousands of combinations of languages, priority values, whitespace, etc.

Before you pick anything in ULS, your initial language is based on the Accept-Language header.

Aren't we using Varnish already? Snippet from wikidata.org:
Server:Apache
Vary:Accept-Encoding
Via:1.1 varnish
Via:1.1 varnish
X-Varnish:1326947601
X-Varnish:3373410970
X-Vary-Options:Accept-Encoding;list-contains=gzip
Comment 22 Niklas Laxström 2012-11-06 11:25:58 UTC
Have to start from somewhere, so I added the Vary headers as Mark suggested in comment #11: https://gerrit.wikimedia.org/r/32030

I believe we can start with this and make it more granular as needed.
Comment 23 Daniel Kinzler 2012-11-06 12:35:18 UTC
(In reply to comment #22)
> Have to start from somewhere, so I added the Vary headers as Mark suggested in
> comment #11: https://gerrit.wikimedia.org/r/32030
> 
> I believe we can start with this and make it more granular as needed.

As I said in my comment there: this would explode the cache, varying on every possible combination of things in the Cookie and Accept-Language headers. If we can't do better than that, just send "Cache-Control: no-cache, must-revalidate".
Comment 24 Mark Bergsma 2012-11-06 14:11:21 UTC
(In reply to comment #20)
> Oh, to add one more problem: purging. When a page changes, all variants
> (languages) of that page need to be purged. I don't think we currently have a
> mechanism for this at all.

All variants of a URL are purged by Squid and Varnish, so that in itself is not a problem.
Comment 25 Mark Bergsma 2012-11-06 14:18:17 UTC
(In reply to comment #14)
> I dug up Tim's original mail containing the patch that introduces
> X-Vary-Options into Squid:
> <http://www.squid-cache.org/mail-archive/squid-dev/200802/0085.html>. There
> seems to be very li8ttle documentation beyond that. From what I see, the best
> we can do with that is something like: 
> 
> X-Vary-Options: Cookie; string-contains=language; 
> 
> But that would just split the cache into two: requests with the "language"
> cookie set, and those without the language cookie set. It does not very on the
> value of the cookie. Also - is it still true that X-Vary-Options is a custom
> patch, and never got into the Squids main line? if not, please fix
> <http://wiki.squid-cache.org/Features/XvaryOptions> :)

I believe it did enter Squid main line. But that doesn't help much; other caches (like Varnish) don't have it.

> Anyway - what we would actually need is something like 
> 
> X-Vary-Cookie: language
> 
> ...that would vary on the value of the language cookie. Or maybe X-Vary-Options
> could be extended to cover this:
> 
> X-Vary-Options: Cookie; string-extract=language=([^;]*); 
> 
> But this is messy, and I don't think anyone is up to patching Squid further.
> So... other options? Could we vary on the ETAg header, and generate it
> depending on the user language? 

No, since the client doesn't know the E-Tag header of course, and isn't sending it.
 
> PS: I'm also a bit surprised that this issue comes at a surprise. Allowing
> anons to set the interface language is requested about once a year, and this
> problem is the reason it got shot down time and time again. With regards to
> Wikidata, I discussed this problem with several staffers at the Berlin
> hackathon and at Wikimania. Essentially I was told "wait for ULS, they'll take
> care of that stuff". Hm.

I am too. It's not exactly a new problem.

Of course exploding the cache is not a big problem for a small wiki like wikidata, but we can't even think about deploying this on any larger wikis until we've solved this properly...
Comment 26 Mark Bergsma 2012-11-06 14:20:20 UTC
So eventually we'll move the "text cluster" to Varnish as well, which would make this problem slightly easier as we can influence request headers in VCL. ULS would need to be adapted for efficient use of that. But the migration of the Text caching cluster to Varnish is at least another 6 months out still.
Comment 27 Daniel Kinzler 2012-11-06 15:30:58 UTC
@mark: so how about ULS just send out "Cache-Control: no-cache" for now? That would cause all pages to just bypass the proxies, right? Better than a vary on cookie and accept-language, and probably acceptable for Wikidata, where we expect few anon visitors.
Comment 28 Mark Bergsma 2012-11-06 16:03:51 UTC
I think neither disabling all caching, nor fragmenting the cache are acceptable solutions for any wiki getting more than a tiny amount of traffic. But if I absolutely had to pick one, it would be fragmenting the cache, along with a lowish (5 mins?) cache ttl. Because that would still protect the backend infrastructure a bit in case of spikes/slashdotting/etc. But again, we really need a better solution than either of those two. If not, we'd probably be forced to turn ULS off entirely when hitting problems/spikes.
Comment 29 Daniel Kinzler 2012-11-06 16:09:45 UTC
For the record: I'm told that with varnish, we could vary on the value of a given cookie. That would solve the problem. But migration to varnish is at least 6 months away. 

I think ULS should have an option for working with that kind of varnish setup efficiently.
Comment 30 Ori Livneh 2012-11-07 06:19:13 UTC
(In reply to comment #29)
> For the record: I'm told that with varnish, we could vary on the value of a
> given cookie. That would solve the problem. But migration to varnish is at
> least 6 months away. 
> 
> I think ULS should have an option for working with that kind of varnish setup
> efficiently.

Copy-pasting what I said on wikitech-l:

> I don't know about Squid, but there are all manner of ways you could attack
> this problem with Varnish. Overriding vcl_hash lets you customize how a 
> cache key is constructed from a request. It's usually just hostname + URL, 
> but you can add any string to the hash:


    sub vcl_hash {
        if (req.http.Cookie ~ "language") {
            hash_data(regsub(req.http.Cookie, "^.*(language=[^;]+).*$", "\1"));
        }
    }
Comment 31 Mark Bergsma 2012-11-07 11:44:50 UTC
(In reply to comment #30)

> Copy-pasting what I said on wikitech-l:
> 
> > I don't know about Squid, but there are all manner of ways you could attack
> > this problem with Varnish. Overriding vcl_hash lets you customize how a 
> > cache key is constructed from a request. It's usually just hostname + URL, 
> > but you can add any string to the hash:
> 
> 
>     sub vcl_hash {
>         if (req.http.Cookie ~ "language") {
>             hash_data(regsub(req.http.Cookie, "^.*(language=[^;]+).*$", "\1"));
>         }
>     }

You've just broken purging.
Comment 32 Siebrand Mazeland 2012-11-09 09:37:23 UTC
(In reply to comment #31)
> (In reply to comment #30)
> 
> > Copy-pasting what I said on wikitech-l:
> > 
> > > I don't know about Squid, but there are all manner of ways you could attack
> > > this problem with Varnish. Overriding vcl_hash lets you customize how a 
> > > cache key is constructed from a request. It's usually just hostname + URL, 
> > > but you can add any string to the hash:
> > 
> > 
> >     sub vcl_hash {
> >         if (req.http.Cookie ~ "language") {
> >             hash_data(regsub(req.http.Cookie, "^.*(language=[^;]+).*$", "\1"));
> >         }
> >     }
> 
> You've just broken purging.

Mark, you being the expert here, at least to me knowledge, can you please help and work towards a solution? From what I can see, your expertise hasn't been used yet on this issue, except for pointing out what's wrong with proposed solutions.
Comment 33 Daniel Kinzler 2012-11-09 11:04:10 UTC
Going by what I heard from Mark, Tim Starling and others, the situation seems to be like this:

* there is no way to do this with the current Squid caches. Varying con Accept-Language and Cookie got vetoed by Tim.
* Bypassing Squids would make it work, but opens a DoS vector. Mark does *not* like it.
* there are probably ways to do this with the new Varnish caches.
* Migration to Varnish is at least 6 months away.
* ULS can still be used as a convenient way to switch UI language for logged in users. At least for Wikidata, that would still be helpful.

I'd be happy if someone could tell me this assessment is wrong...
Comment 34 Mark Bergsma 2012-11-09 11:20:02 UTC
(In reply to comment #33)
> Going by what I heard from Mark, Tim Starling and others, the situation seems
> to be like this:
> 
> * there is no way to do this with the current Squid caches. Varying con
> Accept-Language and Cookie got vetoed by Tim.
> * Bypassing Squids would make it work, but opens a DoS vector. Mark does *not*
> like it.
> * there are probably ways to do this with the new Varnish caches.
> * Migration to Varnish is at least 6 months away.
> * ULS can still be used as a convenient way to switch UI language for logged in
> users. At least for Wikidata, that would still be helpful.

Correct. There isn't really any solution right now.

I think ULS should only be enabled for logged in users until we have Varnish in place.
Comment 35 Liangent 2012-11-09 11:54:57 UTC
I think I'm now confused about X-Vary-Options: http://www.squid-cache.org/mail-archive/squid-dev/200802/0085.html

Does it mean things are cached with those test results appended as key or things are not cached at all if any test "whether the XXX header contains the string YYY" succeed?
Comment 36 Siebrand Mazeland 2012-11-09 12:51:05 UTC
(In reply to comment #33)
> Going by what I heard from Mark, Tim Starling and others, the situation seems
> to be like this:
> 
> * Bypassing Squids would make it work, but opens a DoS vector. Mark does *not*
> like it.
> * there are probably ways to do this with the new Varnish caches.
> * Migration to Varnish is at least 6 months away.

I'm confused. See below.

(In reply to comment #21)
> Aren't we using Varnish already? Snippet from wikidata.org:
> Server:Apache
> Vary:Accept-Encoding
> Via:1.1 varnish
> Via:1.1 varnish
> X-Varnish:1326947601
> X-Varnish:3373410970
> X-Vary-Options:Accept-Encoding;list-contains=gzip

This remained unanswered. So I'm asking again: Aren't we using Varnish already?
Comment 37 Mark Bergsma 2012-11-09 13:04:33 UTC
(In reply to comment #36)

> I'm confused. See below.
> 
> (In reply to comment #21)
> > Aren't we using Varnish already? Snippet from wikidata.org:
> > Server:Apache
> > Vary:Accept-Encoding
> > Via:1.1 varnish
> > Via:1.1 varnish
> > X-Varnish:1326947601
> > X-Varnish:3373410970
> > X-Vary-Options:Accept-Encoding;list-contains=gzip
> 
> This remained unanswered. So I'm asking again: Aren't we using Varnish already?

No we're not. Elsewhere, but not on the text cluster, which is what's relevant here. I'm not sure where the snippet above comes from, but not from the text caching cluster.
Comment 38 Aude 2012-11-09 15:07:14 UTC
I assume the varnish is coming from the bits, which includes images and also resource loader stuff (e.g. javascript).

I think what matters for the use-case here is the actual html page, which still comes from squid.
Comment 39 Faidon Liambotis 2012-11-09 16:14:19 UTC
As I've noted elsewhere, do note that there are intermediate (forward) caches around in the world, and you have to take those into account too when setting HTTP cache headers.
Comment 40 Erik Moeller 2012-11-10 13:11:12 UTC
Please help me better understand our options here.

My understanding is that in the near term the ULS folks are only deploying ULS to small wikis (Wikidata is probably one of the biggest ones). So would disabling or splitting the cache if the ULS cookie is present be a) acceptable, and b) feasible using one of the available methods, e.g. X-Vary-Options? 

(Does ULS have to set a cookie if the user does not change the language? It seems to do so regardless of whether I change the language or not right now.)

If that's feasible, then I would suggest going for that path, limiting ULS deployments for logged out users to small wikis until the Varnish migration, and working with someone in ops until then to explore the best option to scale caching of UI language variants of the same page properly via Varnish. Does that make sense?
Comment 41 Niklas Laxström 2012-11-11 05:05:32 UTC
(In reply to comment #40)
> (Does ULS have to set a cookie if the user does not change the language? It
> seems to do so regardless of whether I change the language or not right now.)

We can change it to not add* cookie if the user interface language === wiki content language.

* and remove if already existing
Comment 42 Patrick Reilly 2012-11-11 06:19:11 UTC
I'm going to talk to Siebrand Mazeland about this issue today and see if we can figure something out short-term.

— Patrick
Comment 43 Daniel Kinzler 2012-11-12 09:01:03 UTC
> My understanding is that in the near term the ULS folks are only deploying ULS
> to small wikis (Wikidata is probably one of the biggest ones). So would
> disabling or splitting the cache if the ULS cookie is present be a) acceptable,
> and b) feasible using one of the available methods, e.g. X-Vary-Options? 

I suppose Mark and Tim are the authorities on the subject, but let me reiterate my understanding:

ad a) Splitting the cache by language in the ULS cookie is actually not the issue - it would rather be a solution to the issue. As far as I understand, it would be fine at least for small wikis. 

ad b) It's not possible with Squid (but probably is with Varnish). 

We could vary on the entire cookie header - but that would be unique per client, not splitting but exploding the cache and making it useless. Or we could use XVO, but that only lets us vary on the *presence* of a cookie - all anons with the ULS cookie set (no matter to which value) would hit the same cached version, which would not improve the situation at all.

Or we could hack squid to make this possible. I don't know how complex this is, or who could do it, or how long it would take to roll this out.

There's an option c): use different URLs for different language versions of each page. There are two problems with this: 1) whenever the page changes, *all* the (potential) URLs have to be purged explicitly, increasing the number of purged by two orders of magnitude (and we'd need to hack core to do it). And we need to rewrite *all* links in the interface to the language specific version (needs lots of changes in core and messes with internal caching). 

The StickToThatLanguage extension uses URL parameters and rewrites the links using JavaScript. It does not work without JS. Because it uses the uselang=xx parameter, it bypasses all caches. Maybe squid can be made to vary on the uselang parameter, but then we again have the purging problem. 

Even though, STTL might actually be the best option we have right now.
Comment 44 Liangent 2012-11-12 09:30:44 UTC
(In reply to comment #43)
> Or we
> could use XVO, but that only lets us vary on the *presence* of a cookie - all
> anons with the ULS cookie set (no matter to which value) would hit the same
> cached version, which would not improve the situation at all.

IIRC the language converter varies on the presence of every supported variant code in Accept-Language. Maybe we can vary on the presence of "mw_uls_<langcode>" in cookies here (however the list will be much longer than the language converter one)?
Comment 45 Mark Bergsma 2012-11-12 11:21:20 UTC
(In reply to comment #40)
> Please help me better understand our options here.
> 
> My understanding is that in the near term the ULS folks are only deploying ULS
> to small wikis (Wikidata is probably one of the biggest ones). So would
> disabling or splitting the cache if the ULS cookie is present be a) acceptable,
> and b) feasible using one of the available methods, e.g. X-Vary-Options? 
> 
> (Does ULS have to set a cookie if the user does not change the language? It
> seems to do so regardless of whether I change the language or not right now.)
> 
> If that's feasible, then I would suggest going for that path, limiting ULS
> deployments for logged out users to small wikis until the Varnish migration,
> and working with someone in ops until then to explore the best option to scale
> caching of UI language variants of the same page properly via Varnish. Does
> that make sense?

That seems reasonable, yes. It takes care of the "slashdot" problem, since first time visitors don't have a cookie set and get the cached default language page. When there's a cookie set, we better not cache it at all since the cache hit rate would be extremely low anyways, and it would just inflate the cache.

As Faidon indicates, we'll have to send Vary: cookie headers anyway, for other caching proxies out there. That will destroy their cache hit rate on this as well, but there's not much we can do about that.
Comment 46 Mark Bergsma 2012-11-12 11:33:34 UTC
(In reply to comment #45)
> As Faidon indicates, we'll have to send Vary: cookie headers anyway, for other
> caching proxies out there. That will destroy their cache hit rate on this as
> well, but there's not much we can do about that.

Actually, they have to revalidate every time anyway, so that doesn't matter. :)
Comment 47 Erik Moeller 2012-11-13 07:32:07 UTC
(In reply to comment #45)

> That seems reasonable, yes. It takes care of the "slashdot" problem, since
> first time visitors don't have a cookie set and get the cached default language
> page. When there's a cookie set, we better not cache it at all since the cache
> hit rate would be extremely low anyways, and it would just inflate the cache.

So how would this work in practice?

Scenario A:

1) Client requests https://wikidata.org/ _without_ ULS cookies present and without being logged in.
2) Say we get a cache MISS. Page is returned with XVO including the ULS cookie name, and with Vary: cookie for other caches not supporting XVO.
3) Page is now cached server-side.
4) ULS is loaded client-side. It does not set any cookies because the user does not change her default language.
5) User continues to browse as normal.
6) User gets cache HITs or MISSes as normal in the default language.

Scenario B:

1) Client requests https://wikidata.org/ with ULS cookie present due to a previous language change via ULS.
2) We get a cache MISS because page hasn't been previously cached in variant with ULS cookie present.
3) MediaWiki checks for ULS cookie and sends "Cache-Control: no-cache, no-store, must-revalidate" header alongside the requested page.
4) Squid therefore does not cache the page.
5) The same is true for subsequent pageviews, including pageviews by other users with the ULS cookie present. The user now consistently gets cache MISSes, including from intermediate caches.

Would this approach more or less work for small wikis or am I fundamentally misunderstanding something?
Comment 48 Niklas Laxström 2012-11-13 09:26:07 UTC
It seems to be forgotten in this discussion that ULS will also set the default language for anon users based on the accept-language header even before they select any language explicitly. That feature would not work with the solution described in comment #47.
Comment 49 Erik Moeller 2012-11-13 09:50:13 UTC
Niklas, that feature could be disabled for now. If I'm understanding things correctly, implementing that feature reasonably well would require selection of cached language copies of a page based on the contents of the ULS cookie, and scalable implementation of purging across all cached copies.

Is the description in comment 47 correct/workable? Is it preferable to a URL-based approach? 

If the answers are yes and yes, I suggest we iterate, and once we've got that solution implemented and deployed on small wikis that use ULS, focus on what the desired behavior will be in the glorious Varnish future.
Comment 50 Erik Moeller 2012-11-13 10:02:41 UTC
(In reply to comment #49)
> If I'm understanding things
> correctly, implementing that feature reasonably well would require selection of
> cached language copies of a page based on the contents of the ULS cookie

To clarify, the Accept-Language feature would require selecting the correct variant from the cache based on Accept-Language, _and_ overriding that choice with the variant specified by the ULS cookie if set. Again, please limit the feature set to something that's feasible now and iterate from there.
Comment 51 Ori Livneh 2012-11-13 23:55:15 UTC
Created attachment 11354 [details]
Patch for Squid to add cookie headers to output sent to storeurl_rewrite_program
Comment 52 Ori Livneh 2012-11-13 23:56:50 UTC
Created attachment 11355 [details]
storeurl_rewrite_program in Python that adds value of ULS cookie as query param to URL.
Comment 53 Ori Livneh 2012-11-13 23:57:26 UTC
I wrote a patch for Squid that I hope would fix this issue.

Squid 2.x has a 'storeurl_rewrite_program' directive, which specifies an external program that Squid calls to rewrite / canonicalize URLs prior to performing cache operations:

http://www.squid-cache.org/Doc/config/storeurl_rewrite_program/

The rewriter is a simple program that reads a request log on standard input and writes URLs to standard output. The format of the request log that Squid sends the rewriter does not contain the cookie headers, but adding them requires the addition of just one line to store_rewrite.c. I've attached a patch made against  the current stable tag (SQUID_2_7).

I wrote a simple rewriter in Python that checks for the presence of a 'ULS' cookie and adds it to the URL as an additional query parameter (also attached).

To state the obvious: making even a small change to Squid is a big deal, so this would need to be reviewed very carefully by ops to make sure it is correct. The effort required may or may not be worth it, depending on the practicality of other available workarounds. But I will note that there may be an additional benefit to using a storeurl rewrite program: we could apply some ordering rule on query parameters, which could plausibly improve cache performance. (Perhaps we're doing this already -- I'm not too familiar with our setup.)
Comment 54 Ori Livneh 2012-11-14 00:02:20 UTC
Created attachment 11356 [details]
Patch for Squid to add cookie headers to output sent to storeurl_rewrite_program
Comment 55 Daniel Kinzler 2012-11-14 09:26:56 UTC
(In reply to comment #53)
> Squid 2.x has a 'storeurl_rewrite_program' directive, which specifies an
> external program that Squid calls to rewrite / canonicalize URLs prior to
> performing cache operations:

Squids are the front tier. They get hit about a hundred thousand times per second (perhaps a few hundred times each) on the wikimedia cluster. Are you sure this scales?
Comment 56 Daniel Kinzler 2012-11-14 09:36:17 UTC
(In reply to comment #53)
> I wrote a simple rewriter in Python that checks for the presence of a 'ULS'
> cookie and adds it to the URL as an additional query parameter (also attached).

Does Squid consider that a variant, or a separate URL? If it's a separate URL, purging becomes a problem, because we then have to explicitly purge all 500 or so possible URL variations.
Comment 57 Ori Livneh 2012-11-14 10:35:28 UTC
Ok, so this too would screw with purging. Sorry for being daft. Following a discussion about this with Daniel and Tim on IRC, it appears that the right way to fix this is:

1) Disable the extension for now.
2) Amend Tim's X-Vary-Options patch (http://paste.ubuntu.com/1357630/) to also operate on cookies.
Comment 58 Siebrand Mazeland 2012-11-14 11:35:11 UTC
AFAIK Patrick Reilly has some thoughts. I hope he can share them here.
Comment 59 Daniel Kinzler 2012-11-15 19:57:38 UTC
I have filed two feature requests to ULS for two possible solutions:

* disable ULS for anons: bug 42157
* disable language detection (Eric's proposal): bug 42159

Perhaps this here report should be moved to the Wikimedia/wikidata component, because it's about a solution for wikidata.org that involves ULS and Squid configuration.
Comment 60 Tim Starling 2012-11-16 00:37:29 UTC
(In reply to comment #59)
> I have filed two feature requests to ULS for two possible solutions:
> 
> * disable ULS for anons: bug 42157

Implemented and deployed.
Comment 61 Ori Livneh 2012-11-16 18:11:24 UTC
The canonical home for Tim's X-Vary-Options patch is:
https://gerrit.wikimedia.org/r/gitweb?p=operations/debs/squid.git;a=blob;f=debian/patches/26-vary_options.dpatch;hb=HEAD

When Tim posted his patch to the squid-dev mailing list in 2008, there seem to have been interest in merging it to Squid-2.HEAD. Adrian Chadd, one of Squid's maintainers, wrote:

> I'm happy to commit this to Squid-2.HEAD as-is. Can you throw it in a
> Bugzilla report and spit me the number?
http://www.squid-cache.org/mail-archive/squid-dev/200802/0282.html

The idea of extending this patch to handle cookie names and values was floated later in the thread. One way to move this current ticket forward would be to do exactly as Adrian suggests and file a Bugzilla bug for this patch on Squid's bug tracker, provide a link to (and a summary of) this discussion, and then e-mail squid-dev about it. Squid 2.7.9 is still the stable head of the 2.7 version and is widely used, so it is not implausible that someone with the requisite skill will step up.
Comment 62 Nemo 2013-04-24 13:08:23 UTC
(In reply to comment #61)
> The canonical home for Tim's X-Vary-Options patch is:
> https://gerrit.wikimedia.org/r/gitweb?p=operations/debs/squid.git;a=blob;
> f=debian/patches/26-vary_options.dpatch;hb=HEAD
> 
> When Tim posted his patch to the squid-dev mailing list in 2008, there seem
> to
> have been interest in merging it to Squid-2.HEAD. Adrian Chadd, one of
> Squid's
> maintainers, wrote:
> 
> > I'm happy to commit this to Squid-2.HEAD as-is. Can you throw it in a
> > Bugzilla report and spit me the number?
> http://www.squid-cache.org/mail-archive/squid-dev/200802/0282.html
> 
> The idea of extending this patch to handle cookie names and values was
> floated
> later in the thread. One way to move this current ticket forward would be to
> do
> exactly as Adrian suggests and file a Bugzilla bug for this patch on Squid's
> bug tracker, provide a link to (and a summary of) this discussion, and then
> e-mail squid-dev about it. 

Was this done?

> Squid 2.7.9 is still the stable head of the 2.7
> version and is widely used, so it is not implausible that someone with the
> requisite skill will step up.
Comment 63 Helder 2013-06-06 21:12:22 UTC
(In reply to comment #62)
> (In reply to comment #61)
> Was this done?
Ping.
Comment 64 Niklas Laxström 2013-06-07 07:50:20 UTC
I don't think so. Nevertheless it is soon irrelevant as remaining squids are being migrated to varnish (as far as I know).
Comment 65 denny vrandecic 2013-08-22 14:54:39 UTC
Closed older resolved bugs as verified.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links