Last modified: 2011-03-13 18:04:54 UTC
I discovered this accidentally while rewriting my archiving bot to use the API. It gets its list of pages by examining transclusions of a particular template. Thus, we start with http://en.wikipedia.org/w/api.php?format=jsonfm&einamespace=5&list=embeddedin&eititle=User:MiszaBot/config&eilimit=100&action=query - this is for the 'Wikipedia talk:' namespace. This should work fine; we get "eicontinue": "2|MiszaBot\/config|2072126", so let's continue on to the next batch: http://en.wikipedia.org/w/api.php?format=jsonfm&einamespace=5&list=embeddedin&eititle=User:MiszaBot/config&eilimit=100&action=query&eicontinue=2|MiszaBot/config|2072126 Which may or may not work for you when you try it; if it does, try following the next eicontinue. At some point (especially with "cold index cache", I presume; which is why it's not reliably reproducible), it may choke and (after a minute or so of waiting) you get a Wikimedia error page (in HTML, not JSON, so the parser freaks out) from a squid explaining that an ERR_READ_TIMEOUT occured. I wouldn't be too hasty attributing this to the squids, because I've seen it happen even while going 10 results per request (which really shouldn't take that long). Oddly, there are no problems when I just omit einamespace and filter in my program based on the "ns" parameter of the yielded pages.
This works for me, I don't get any timeouts until I reach the end. I'll take a look at the query it runs, though.
It seems that blnamespace/einamespace is implemented inefficiently, which causes the timeout. I won't remove it, though, because the UI also allows such queries. Closing as WONTFIX unless someone authoritative says this needs to go.
What do you mean by "implemented inefficiently"? (Sorry, I'm a n00b with the MediaWiki query building framework.) To me, it seems like just another condition in the WHERE clause. Maybe this hint causes MySQL to use an inefficient query plan? Just guessing... Anyway, if the UI can do it without stalling, then so should the API (and not for a moment did I consider removing it completely). Until then, it somewhat impairs functionality, so this should be left open (or reassigned to other component, if you can confirm it's a schema/index issue).
(In reply to comment #3) > What do you mean by "implemented inefficiently"? (Sorry, I'm a n00b with the > MediaWiki query building framework.) > > To me, it seems like just another condition in the WHERE clause. Maybe this > hint causes MySQL to use an inefficient query plan? Just guessing... > Yes. > Anyway, if the UI can do it without stalling, then so should the API (and not > for a moment did I consider removing it completely). Until then, it somewhat > impairs functionality, so this should be left open (or reassigned to other > component, if you can confirm it's a schema/index issue). > Both run the exact same query. If the API stalls because of the query, the UI will too.