Last modified: 2013-06-05 07:08:55 UTC
After seeing a large number of queries for all records in ipblocks with order ipb_timestamp desc I today added the index ipb_timestamp (ipb_timestamp desc) to all databases on the Wikimedia servers. This changes the query from one using no index and a filesort to get the order to one using the new index. The queries were happening for both 1.4 and pre-1.4 projects. Block::enumBans is the query comment on the 1.4 wikis. Seems likely to be of assistance for other mediawiki users. In addition, the ipblocks in 1.4 showed a FOR UPDATE query which wanted every record in the table. Is it really necessary to get an exclusive read and write lock on the whole table? Maybe do it in ipb_address like '1%' then like '2%' and so on to like '9%' chunks, so most lock checks can continue while some are being updated? Perhaps only after a check without for update shows more than 50 blocks total?
This one matters because stupid crawlers read the edit page and can stress the master. Should ideally ask a slave when loading the edit page, with no lock needed, then check the master only when the edit is being saved. Occasionally this change will let a block last too long (the slave check, if the slave is lagged) or let someone who has just been blocked type in an edit and not save it, but those costs seem reasonable enough to avoid hurting the master.
For Ryo, desired behavior is: When fetching edit page: 1. check memcached to see if the user is shown as blocked. If not blocked, give them the edit page. When saving the edit: 1. Check memcached to see if they are blocked now 2. if not blocked, check a slave to see if they are blocked 3. if still not bloocked, check the master to see if blocked there, and save the edit if not blocked. This way a crawler which loads the edit page will only cause a check of memcached, but we'll stll get accurate block results to prevent the edit from being saved if memcached happens to be out of date becaue of the race condition Tim mentioned. We check the slave before the master because it's much cheaper to check slaves and save the master work - it's easier to add more slave power than master power. Memcached can be loaded from a slave because we won't care if it is a little bit out of date - we'll catch that when the user tries to edit.
After discussion with Tim, here's the behaviour i'll implement: * when editing/previewing a page, check with slave * when saving the page, check with master So no more memcache checking. I'd also suggest getting rid of block checking when simply previewing the page - may be less user-friendly to learn you're blocked only when saving, though.
Created attachment 220 [details] Change block check behaviour Modified BlockCache, EditPage and User. The behaviour is now: * when previewing a page, don't check the blocked status. * when requesting edition page (click on edit link), check blocked status against slave. * when saving the page, check blocked status against first slave, then if not blocked against master. Memcache is not used anymore.
This was added to CVS in February 2005 by 'ryo_saeba', and is in 1.5.
For the record, the relevant performance optimisation was accidentally removed in r24562 (August 2007). Since then, edit page requests have checked blocked status against the master DB, with no memcached. We have indexes now, and the FOR UPDATE clause was removed, so there was no disaster. But it is the third most common master query according to WMF profiling, if you don't count BEGIN.