Last modified: 2010-05-15 15:28:23 UTC
Sequences like this for recentchanges happen several times an hour: DB Time Query or State -- ---- ---------- enwiki 239 UPDATE recentchanges SET rc_this_oldid=6638303 WHERE rc_namespace=0 AND rc_title='Legend_tripping' AND rc_timestamp='20040912055536' enwiki 205 UPDATE recentchanges SET rc_this_oldid=6638305 WHERE rc_namespace=0 AND rc_title='Gholamhossein_Mosahab' AND rc_timestamp='20040229205048' enwiki 204 UPDATE recentchanges SET rc_this_oldid=6638306 WHERE rc_namespace=6 AND rc_title='ZeroG.jpg' AND rc_timestamp='20040623185548' enwiki 190 UPDATE recentchanges SET rc_this_oldid=6638307 WHERE rc_namespace=0 AND rc_title='Australian_hornet' AND rc_timestamp='20040828135413' enwiki 189 UPDATE recentchanges SET rc_this_oldid=6638308 WHERE rc_namespace=0 AND rc_title='Nima_Yooshij' AND rc_timestamp='20040326005112' enwiki 184 UPDATE recentchanges SET rc_this_oldid=6638309 WHERE rc_namespace=6 AND rc_title='As15-86-11603.jpg' AND rc_timestamp='20040704033819' enwiki 170 UPDATE recentchanges SET rc_this_oldid=6638313 WHERE rc_namespace=6 AND rc_title='Apollo15missionpatch.png' AND rc_timestamp='20040726010236' enwiki 170 UPDATE recentchanges SET rc_this_oldid=6638314 WHERE rc_namespace=0 AND rc_title='City_of_Sydney' AND rc_timestamp='20041009112904' Notice the timestamps, for a time before any existing recentchanges records. Show innodb status indicates that they are waiting for a lock on the rc_timestamp index, presumably all waiting for the first block, since all are before any entries. Suggested change: A. Check the timestamp to see if it is earlier than the recentchanges purge time limit. If it is, don't update: if the record still exists it is going to be purged soon anyway. This would be less pleasant for low traffic wikis, where purges may be relatively infrequent and the records without the information would stay around for a while. B. Unnecessary if A is done, better overall. If the timestamp is before the purge threshold, just do the update. If not: Use the rc_timestamp index and search for any entries with the timestamp from the planned update, limit 1. If no results, there's no work to do and no write lock needed. Do the search on a DB slave. Two possible wrong result cases for a slave: 1. New record or edit. The timestamp will be within the purge threshold so this code path will never happen. 2. Record just purged from the master, still replicating. This will show a hit and a need to update the record. The master will get an unnecessary update, just as it does now. But far less of them.
Isn't this due to the infrequent, large, slow purges locking things up? Would it be better to just do smaller, more frequent purges?
Purging from a cron job or jobs in small bites off peak would be ideal for the bigger databases. No need to do them in connection with an end user operation which could happen at a high load time. delete low priority where timestamp < threshold (for mysql 4.0.0 and later: order by rc_id) limit 20 at fairly frequent intervals might be a reasonable approach. I expect that to be OK with end user things at peak times as well, just not as nice to have the load then when we don't care about prompt purging. These queries are the longest running ones on the server at the time they happen. That seems to argue against them waiting for the purge. But something is causing them to wait and I don't know what it is. Want a copy of the log with show processlist and show innodb status for some of these to see if you can see what's causing them? I look but haven't found it yet, other than that rc_timestamp index being involved. It might be the delete having happened and something in the database engine keeping a lock for a while. Maybe. In any case, there's no point in even trying most updates with timestamps this old because they are to records which are well before the purge threshold. Just wastes some master disk/CPU time without any benefit to try it. Quick check on the slave and then discard the potential update if the slave doesn't have it is the way to go for any which are less recent than the purge threshold. I'm curious about the cause of the delay but since the updates aren't necessary anyway I can happily forget that curiosity this time.
Cron jobs are a solution for wikipedia.org, but not for other cases, where web space providers might not offer a way to schedule cron jobs.
Definitely agreed. This sort of thing is one of the cases where the way Wikimedia and other big projects work best is different from how smaller and shared hosting places need to work.
A update on these: it's not only older records which are affected, it's all. The problem is a lock on the rc_timestamp index which is being used to find the rc_id value so the record can be updated. rc_timestamp is more unique than the combination of rc_namespace and rc_title (about half the number of matches). Asking a slave to look up the rc_id value and then updating via the rc_id will reduce the demand for the rc_timestamp index.
Please at least implement at least a configurable time threshold, if not the existence check, for 1.4. These rc_old_id updates are by far the most frequent cause of save delays, happening many times a day and it's completely unnecessary to try to update RC records several weeks ago when we know that RC entries are kept for little more than a week.
assumed fixed or at least different in 1.5.