Last modified: 2014-08-03 18:21:24 UTC
updateCollation.php script is awfully slow. It took over a week for fr.wp (bug 54680), it'll probably take months if we ever decide to run it on en.wp. That kinda sucks. I'm not sure what can be done, or if it's just a problem on WMF configuration, or what, so I'm just filing this and asking for comments. Please resolve as INVALID if we in fact can't do anything about this. Possible causes: * The workaround from bug 45970 which makes it use an index that's not entirely perfect for the task (but likely good enough, no idea how much slower that makes the script). * Slave synchronisation, in which case maybe we can do something with ops involvement? Don't ask me. * There's just too much data being sent back-and-forth between PHP and the database, in which case we can't do anything (unless we implement collating entirely database-side, which I've been told is a bad idea). I'm CC-ing competent people. Help?
My guess would be that the icu algorithm is slow, and that is the bottleneck. (Possibly uca-fr is even slower than normal icu as it has to do special things with accents in that language). However that is pure speculation. We should do profiling to figure out where the bottleneck really is.
<springle> Reedy: could UpdateCollation do reads from a slave? <Reedy> Couldn't see why not... We've got the wfWaitForSlave() calls in anyway, so they should be up to date for when we do the next select() <Reedy> TimStarling might have some input too <TimStarling> the select is too slow? <springle> yes. does a large filesort. innodb buffer pool gets out of whack pulling on old data, other writes pile up, then swap starts and everything crawls, then max_connections <TimStarling> I suppose it would work, as long as the slave is guaranteed to have no open snapshot <springle> trying without adaptive hash latch now, but not convinced that's a core issue -- 5.5 already has some fo the old related bugs fixed <TimStarling> maybe it could use a $dbr->commit() before the select to be on the safe side <springle> the select does ORDER BY cl_to, cl_type, cl_from .. is that definitely needed? <TimStarling> springle: yes, see https://gerrit.wikimedia.org/r/#/c/53301/ <springle> ok
Change 106162 had a related patch set uploaded by Reedy: Make SELECT queries against slaves in updateCollation.php https://gerrit.wikimedia.org/r/106162