Last modified: 2009-09-11 06:35:34 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 14402 - Wikimedia setup interfering with API maxage and smaxage parameters
Wikimedia setup interfering with API maxage and smaxage parameters
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Mark Bergsma
http://web-sniffer.net/?url=http%3A%2...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-06-04 01:08 UTC by Splarka
Modified: 2009-09-11 06:35 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Splarka 2008-06-04 01:08:26 UTC
Please allow the HTTP headers "Cache-Control", "Expires", and "X-Cache*" type headers to be modified in API queries by utilizing the &maxage URL parameter, similar to action=raw's utilization. This would allow noncritical queries in user scripts, that perhaps activate on every page load, to not strain the servers overtly. 

Inspiring example is a conceptual user status script that would query the API on every page load of participating users to check their last edit. By giving the query a squid-level and browser-level short cache, such as 15 minutes, the backend queries would be greatly reduced by the same user, or multiple users, visiting that user's page.

This parameter should probably be ignored for any write-function or anything sending POST data.
Comment 1 Roan Kattouw 2008-06-11 20:28:29 UTC
Using Squid to cache API requests looks like a good idea, but purging those caches when something changes ranges from extremely tricky to impossible, depending on the type of query. An edit should purge all API requests involving the page, the user, recentchanges, logevents (we shouldn't even bother caching those two, anyway), and possibly other things as well. The problem is that there's a wide variety of possible requests due to the large number of modules and the generator feature, and that it's probably not possible to purge all requests that need to be purged due to the API's dynamic nature. Meta queries like siteinfo are simply impossible to cache, because there's no way to track whether and when the information displayed (based on the interwiki tables and a range of $wg variables) has changed.

A far less problematic approach would be to implement throttling on the client side, by e.g. running the query every 15 minutes and caching the result.
Comment 2 Brion Vibber 2008-06-11 20:41:03 UTC
> Meta queries like siteinfo are simply impossible to cache, because there's no
> way to track whether and when the information displayed (based on the interwiki
> tables and a range of $wg variables) has changed.

It's entirely possible to cache them; they just might return stale results.

Since caching here would be enabled only for requests that specifically ask for it in the URL, clients would have to understand the potential risk and decide for themselves whether that's appropriate for them.


> A far less problematic approach would be to implement throttling on the client
> side, by e.g. running the query every 15 minutes and caching the result.

That's not necessarily feasible for client-side JavaScript, which could benefit from shared caching at a higher level.
Comment 3 Roan Kattouw 2008-06-11 20:47:05 UTC
(In reply to comment #2)
> It's entirely possible to cache them; they just might return stale results.
> 
> Since caching here would be enabled only for requests that specifically ask for
> it in the URL, clients would have to understand the potential risk and decide
> for themselves whether that's appropriate for them.

When are these caches purged, then? And how, given that there are infinitely many possible URLs that request siteinfo? Squid caching is probably not gonna work because of that; we could implement some generic form of query caching, though. The problem is that that still has to go through the API and the database, kind of defeating the point (in fact, siteinfo queries cached that way will probably be slower than normal ones).


> > A far less problematic approach would be to implement throttling on the client
> > side, by e.g. running the query every 15 minutes and caching the result.
> 
> That's not necessarily feasible for client-side JavaScript, which could benefit
> from shared caching at a higher level.

True.
Comment 4 Brion Vibber 2008-06-11 20:59:29 UTC
> When are these caches purged, then? 

Never; they would just expire after X seconds. Hence "might return stale results."

For many purposes, it doesn't matter that, say, there's an 0.0000001% chance that the list of namespaces changed in the last couple hours.
Comment 5 Roan Kattouw 2008-06-11 21:04:42 UTC
(In reply to comment #4)
> > When are these caches purged, then? 
> 
> Never; they would just expire after X seconds. Hence "might return stale
> results."
> 
> For many purposes, it doesn't matter that, say, there's an 0.0000001% chance
> that the list of namespaces changed in the last couple hours.
> 

That's true, but the question remains exactly how we're gonna implement caching this kind of information. The only efficient way of doing so is by using a Squid-like cache that bypasses PHP altogether, since having the API itself fetch the data from some kind of cache (memcached, DB) would probably be slower. A problem with the former, however, is that there are multiple (if not lots of) possibilities for cacheable API requests, some of which might even combine multiple cacheable properties.
Comment 6 Brion Vibber 2008-06-11 21:17:13 UTC
This feature req is about HTTP caching, which is accomplished roughly like...

  Cache-Control: public, s-maxage: 3600, max-age: 3600

for public squid caching or...

  Cache-Control: private, max-age: 3600

for private user-agent caching.

(Note that anything for *public* caching would need to avoid doing authentication, sending cookies, etc.)
Comment 7 Splarka 2008-06-11 21:57:11 UTC
> This feature req is about HTTP caching

Indeed. Even just private caching headers would be good. Especially for userscripts that make API calls or importScriptURI() on page loads, that might be navigated away from and reloaded via forward/back.

I'd even go so far as to suggest that a 5 minute default (overridable with &maxage=0 of course) for all &callback= might be a good idea.
Comment 8 Roan Kattouw 2008-06-16 19:50:59 UTC
smaxage done in r36347, lemme know if you need the regular maxage as well.
Comment 9 Splarka 2008-06-16 19:55:34 UTC
(In reply to comment #8)
> smaxage done in r36347, lemme know if you need the regular maxage as well.
> 

Actually, yes, maxage would be more useful than smaxage (per all my comments). Please include both if possible, thanks!
Comment 10 Roan Kattouw 2008-06-16 20:06:58 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > smaxage done in r36347, lemme know if you need the regular maxage as well.
> > 
> 
> Actually, yes, maxage would be more useful than smaxage (per all my comments).
> Please include both if possible, thanks!
> 

Done in r36349
Comment 11 Splarka 2008-06-21 11:32:24 UTC
Reopening, has no effect. Well, it does have *an* effect, but not the desired effect, and no change to the "Cache-Control" header.

After staring at /api/ApiMain.php trying to figure out why it wasn't working, I've come to three thoughts (bear in mind my php is pseudohp):

 $expires = $exp == 0 ? 1 : time() + $this->mSquidMaxage;

1) Shouldn't this be adding $exp to time() ? (or is it?)

 header('Cache-Control: s-maxage=' . $smaxage . ', must-revalidate, max-age=' . $maxage);

2) Shouldn't "must-revalidate" be conditional?

3) Reason for reopening: &maxage and &smaxage are definitely being set (as they do affect the "Expires" header), but, they have no effect on the "Cache-Control" header at least on Wikimedia. It can further be observed that the parameters in the header statement in ApiMain are in a different order than appears on the http headers:

 'Cache-Control: s-maxage=' . $smaxage . ', must-revalidate, max-age=' . $maxage

 Cache-Control:	private, s-maxage=0, max-age=0, must-revalidate

Possibly something is overwriting the header() that ApiMain attempts to use?
Comment 12 Roan Kattouw 2008-06-21 15:08:03 UTC
(In reply to comment #11)
> Reopening, has no effect. Well, it does have *an* effect, but not the desired
> effect, and no change to the "Cache-Control" header.
> 
> After staring at /api/ApiMain.php trying to figure out why it wasn't working,
> I've come to three thoughts (bear in mind my php is pseudohp):
> 
>  $expires = $exp == 0 ? 1 : time() + $this->mSquidMaxage;
> 
> 1) Shouldn't this be adding $exp to time() ? (or is it?)
You're right, it should add $exp. I changed $this->mSquidMaxage to $exp in some places and forgot this one. Fixed in r36525

> 
>  header('Cache-Control: s-maxage=' . $smaxage . ', must-revalidate, max-age=' .
> $maxage);
> 
> 2) Shouldn't "must-revalidate" be conditional?
On what condition? I have no idea what must-revalidate does...

> 
> 3) Reason for reopening: &maxage and &smaxage are definitely being set (as they
> do affect the "Expires" header), but, they have no effect on the
> "Cache-Control" header at least on Wikimedia.
"at least on Wikimedia" is the key sentence here. It does set the Cache-Control headers on my local install, so maybe Squid or some other program is interfering here? Also, note that errors (including the API help) will NEVER be cached and will therefore simply ignore &maxage and &smaxage

> It can further be observed that
> the parameters in the header statement in ApiMain are in a different order than
> appears on the http headers:
> 
>  'Cache-Control: s-maxage=' . $smaxage . ', must-revalidate, max-age=' .
> $maxage
> 
>  Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
> 
> Possibly something is overwriting the header() that ApiMain attempts to use?
> 
You probably just tested the help screen then. In case of an error (and action=help is a module that always exits with an error), another part of ApiMain overwrites the previously set header with the "no-cache" header you quoted above.

Resolving back to FIXED as it works perfectly for me.
Comment 13 Splarka 2008-06-21 22:56:10 UTC
(In reply to comment #12)
> > Possibly something is overwriting the header() that ApiMain attempts to use?
> > 
> You probably just tested the help screen then. 

No, I tested a dozen different queries and formats:

Help: http://web-sniffer.net/?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Fapi.php&submit=Submit&http=1.1&gzip=yes&type=GET&uak=0

 Cache-Control:	private, s-maxage=0, max-age=0, must-revalidate

Siteinfo: http://web-sniffer.net/?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Fapi.php%3Faction%3Dquery%26meta%3Dsiteinfo%26maxage%3D900%26smaxage%3D900&submit=Submit&http=1.1&gzip=yes&type=GET&uak=0

 Cache-Control:	private, s-maxage=0, max-age=0, must-revalidate	

RC: http://web-sniffer.net/?url=http%3A%2F%2Fen.wikipedia.org%2Fw%2Fapi.php%3Faction%3Dquery%26list%3Drecentchanges%26maxage%3D900%26smaxage%3D900&submit=Submit&http=1.1&gzip=yes&type=GET&uak=0

 Cache-Control:	private, s-maxage=0, max-age=0, must-revalidate

etc...
Comment 14 Roan Kattouw 2008-06-22 13:13:22 UTC
For those URLs, I get: (on my test wiki)

api.php:
   Cache-Control: s-maxage=0, must-revalidate, max-age=0

api.php?action=query&meta=siteinfo:
   Cache-Control: s-maxage=900, must-revalidate, max-age=900

api.php?action=query&list=recentchanges&maxage=900&smaxage=900:
   Cache-Control: s-maxage=900, must-revalidate, max-age=900

So either WMF needs to update ApiMain.php to r36525 (currently at r36502), or something WMF-specific such as Squid is interfering here.
Comment 15 Ilmari Karonen 2008-10-10 18:11:20 UTC
Reopening (and changing product to "Wikimedia").  ApiMain.php has long since been updated, but something on WMF servers is still interfering with this.
Comment 16 Roan Kattouw 2008-10-13 17:28:31 UTC
(In reply to comment #15)
> Reopening (and changing product to "Wikimedia").  ApiMain.php has long since
> been updated, but something on WMF servers is still interfering with this.
> 

Changing summary accordingly.
Comment 17 Roan Kattouw 2009-05-02 14:08:00 UTC
Unassigning for myself, this is a Wikimedia configuration issue.
Comment 18 Brion Vibber 2009-08-21 00:07:45 UTC
Assigning to Mark, CC'ing Fred. Squid config needs to be updated to treat /w/api.php differently?
Comment 19 Domas Mituzas 2009-09-11 06:35:34 UTC
now go fix mediawiki

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links