Last modified: 2011-03-13 18:05:11 UTC
If suitable, please merge the live compressOld.php and compressOld.inc from /home/wikipedia/common/php- 1.4/maintenance with the 1.4 and 1.5 CVS versions. I received a request to exclude categories and their tallk pags, which are currently in considerable flux, from the concatenated compression to make it easier to delete them. I implemented that by adding support for arbitrary SQL restrictions in the query which selects which articles to compress. No safety checks - it's a raw SQL inclusion into the query, which seems OK for a maintenance script. It's currently running live on the site, most recently started like this: nice php compressOld.php en wikipedia -e 20050108000000 -q " cur_namespace not in (10,11,14,15) " -a Burke | tee - a /home/wikipedia/logs/compressOld/20050108enwiki Now shows the query when starting it, in part because it can take 700 seconds to run and in part to show the query in case there's a problem with it: Starting article selection query cur_title >= 'Burke' AND cur_namespace not in (10,11,14,15) ... This one is excluding template, category and their talk pages. EXPLAIN /* compressWithConcat */ SELECT cur_namespace,cur_title FROM `cur` WHERE cur_title >= 'Burke' AND cur_namespace not in (10,11,14,15) ORDER BY cur_title: *** row 1 *** table: cur type: index possible_keys: cur_title key: cur_title key_len: 255 ref: NULL rows: 1420880 Extra: Using where No problems with the explain result. Priority set to high because someone is going to hit a conflict for this when pushing CVS to the live site if it's not merged first.
Bug fix for the change in the live version - included an and for the extra condition when it wasn't necessary.
Now includes a partial fix for the case where the concatenated version would stop with a disconnected from database server error after processing a large number of old record updates (15,000+ seen in one case) - slaves are now checked for lag/pinged after every 500 old record examinations/updates. Also checked before starting for any case with (currently 200) old records to consider. It's still possible for the script to be disconnected from the master when the script gets a large number of old records and takes many minutes loading the results.
seems fixed, reducing priority.
Never merged, but the patch is no longer required since the deletion bug is fixed.