Last modified: 2009-02-04 23:21:39 UTC
http://en.wikipedia.org/w/api.php?action=query&list=usercontribs&ucuser=Gurch&ucprop=patrolled gives <api> <error code="internal_api_error_DBQueryError" info="Database query error"> #0 /usr/local/apache/common-local/php-1.5/includes/db/Database.php(591): Database->reportQueryError('The SELECT woul...', 1104, 'SELECT rev_tim...', 'ApiQueryContrib...', false) #1 /usr/local/apache/common-local/php-1.5/includes/db/Database.php(1001): Database->query('SELECT rev_tim...', 'ApiQueryContrib...') #2 /usr/local/apache/common-local/php-1.5/includes/api/ApiQueryBase.php(222): Database->select(Array, Array, Array, 'ApiQueryContrib...', Array, Array) #3 /usr/local/apache/common-local/php-1.5/includes/api/ApiQueryUserContributions.php(83): ApiQueryBase->select('ApiQueryContrib...') #4 /usr/local/apache/common-local/php-1.5/includes/api/ApiQuery.php(213): ApiQueryContributions->execute() #5 /usr/local/apache/common-local/php-1.5/includes/api/ApiMain.php(427): ApiQuery->execute() #6 /usr/local/apache/common-local/php-1.5/includes/api/ApiMain.php(260): ApiMain->executeAction() #7 /usr/local/apache/common-local/php-1.5/includes/api/ApiMain.php(244): ApiMain->executeActionWithErrorHandling() #8 /usr/local/apache/common-local/php-1.5/api.php(77): ApiMain->execute() #9 /usr/local/apache/common-local/live-1.5/api.php(3): require('/usr/local/apac...') #10 {main} </error> </api>
The error 1104 is: ERROR 1104: The SELECT would examine more rows than MAX_JOIN_SIZE. Obviously a join that is going wrong somewhere.
What's the query that would be run?
(In reply to comment #2) > What's the query that would be run? > SELECT /* ApiQueryContributions::execute Catrope */ rev_timestamp,page_namespace,page_title,rev_user_text,rc_patrolled FROM `page`,`revision` FORCE INDEX (usertext_timestamp) LEFT JOIN `recentchanges` ON ((rc_this_oldid=rev_id)) WHERE (page_id=rev_page) AND rev_deleted = '0' AND rev_user_text = 'Gurch' ORDER BY rev_user_text DESC, rev_timestamp DESC LIMIT 11 I personally don't see how this produces huge joins, as we're joining on unique indices, but then I've been known to be completely wrong about DB-related stuff before :D
This is the explain I get on the toolserver: mysql> EXPLAIN SELECT rev_timestamp,page_namespace,page_title,rev_user_text,rc_patrolled FROM `page`,`revision` LEFT JOIN `recentchanges` ON ((rc_this_oldid=rev_id)) WHERE (page_id=rev_page) AND rev_deleted = '0' AND rev_user_text = 'Gurch' ORDER BY rev_user_text DESC, rev_timestamp DESC LIMIT 11; +----+-------------+---------------+--------+-------------------------------------------+--------------------+---------+--------------------------+---------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+--------+-------------------------------------------+--------------------+---------+--------------------------+---------+----------------------------------------------+ | 1 | SIMPLE | revision | ref | PRIMARY,page_timestamp,usertext_timestamp | usertext_timestamp | 257 | const | 187372 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | recentchanges | ALL | NULL | NULL | NULL | NULL | 6666524 | | | 1 | SIMPLE | page | eq_ref | PRIMARY | PRIMARY | 4 | enwiki.revision.rev_page | 1 | | +----+-------------+---------------+--------+-------------------------------------------+--------------------+---------+--------------------------+---------+----------------------------------------------+ 3 rows in set (0.01 sec) Note join type ALL, key NULL. I only have views on the toolserver, not the actual tables, so I can't try forcing rc_patrolling as the index to use for recentchanges there (nor can I use the FORCE INDEX that was already in the query). When I run it on localhost, it chooses rc_patrolling for the recentchanges join.
(In reply to comment #4) > Note join type ALL, key NULL. I only have views on the toolserver, not the > actual tables, so I can't try forcing rc_patrolling as the index to use for > recentchanges there (nor can I use the FORCE INDEX that was already in the > query). Why not? > When I run it on localhost, it chooses rc_patrolling for the > recentchanges join. > For me too, except that on one of my two computers, I don't *have* an rc_patrolling index, while on the other one I do. The index also isn't listed in tables.sql, and there's no updater patch that adds it. Where did my index go? :O :D Back on topic: maybe adding FORCE INDEX(rc_patrolling) will fix this, provided that index is present on the WMF DB servers. You also seem to indicate that the effect of this FORCE INDEX can't be tested on the toolserver for some reason.
(In reply to comment #5) > The index also isn't listed > in tables.sql, and there's no updater patch that adds it. Where did my index > go? :O :D > Deeper research (thanks to TortoiseSVN's search facilities) turned up r25527 in which Domas removed the rc_patrolling index (added by Greg in r24699) for no apparent reason and without adding a remover in updaters.inc .
(In reply to comment #5) > Why not? FORCE INDEX doesn't work on views, and non-root toolserver users only have access to views. > For me too, except that on one of my two computers, I don't *have* an > rc_patrolling index, while on the other one I do. The index also isn't listed > in tables.sql, and there's no updater patch that adds it. Where did my index > go? :O :D Um, yeah, that would explain it. See r25527. The index doesn't exist. I've dropped it from localhost so I don't get confused in the future. I don't see any good way to get pages' patrolled status, as long as it's only in RC -- why isn't it in revision as well, again? Grr. (simulpost) (In reply to comment #6) > Deeper research (thanks to TortoiseSVN's search facilities) turned up r25527 in > which Domas removed the rc_patrolling index (added by Greg in r24699) for no > apparent reason and without adding a remover in updaters.inc . The reason was almost certainly just because having unneeded indexes is slow. Is there any compelling justification for keeping this? What actually would need to use it other than some API queries?
(In reply to comment #7) > The reason was almost certainly just because having unneeded indexes is slow. > Is there any compelling justification for keeping this? What actually would > need to use it other than some API queries? > It makes it possible to find the recentchanges entries for a set of revisions/revids (either with a join on the revision table or with a WHERE clause), which is generic enough to warrant an index IMO, even if it's just an index on rc_this_oldid.
That's not an access pattern RC is supposed to support, though. It's really just supposed to summarize other tables. If you have the revision id's already, all relevant info should be in other tables like revision and page. The problem is, for some reason rc_patrolled is not.
(In reply to comment #9) > That's not an access pattern RC is supposed to support, though. It's really > just supposed to summarize other tables. If you have the revision id's > already, all relevant info should be in other tables like revision and page. > The problem is, for some reason rc_patrolled is not. > Exactly, same story with the bot flag. So we either need to move/copy the patrolled flag elsewhere, or add that index.
(In reply to comment #10) > Exactly, same story with the bot flag. So we either need to move/copy the > patrolled flag elsewhere, or add that index. In the meantime would it be wise to remove 'patrolled' as an option for ucprop and/or handle this error more gracefully, since it's unlikely to start working again by iteslf?
(In reply to comment #11) > In the meantime would it be wise to remove 'patrolled' as an option for ucprop > and/or handle this error more gracefully, since it's unlikely to start working > again by iteslf? > Temporarily disabled in r46795
Re-enabled in r46825 with a smart JOIN on the timestamp first (indexed) and the revid second (not indexed, but usually a no-op or a very small set since two revs with the same timestamp by the same user are pretty rare). Thanks to Tim for the tip.