Last modified: 2011-11-29 03:20:57 UTC
The portion of the XML dump that logs events to all pages has been broken for a while now. It's generating empty gzip files.
This might be a superfluous step as logging.xml.gz seemingly has all the content of what this step is trying to provide. I'm following up with Aaron Schulz to confirm.
I mentioned this a while ago. It probably is due to some second pass. The dumping works fine in my regular test environment.
(In reply to comment #2) > I mentioned this a while ago. It probably is due to some second pass. The > dumping works fine in my regular test environment. > Can you clarify what the intention is between what the first logging pass does vs. the second? Just trying to make sure I understand why there are two passes.
Really there should only be one pass, all the data is there already. Other passes would be the result of bundling the code with the text dumps, which actually do need two passes.
So the XmlDump("logging",...) call probably can go (in worker.py), since the stub call does it (though that is a bit misnamed then)
I went ahead and just moved it to its own class to keep it clean and less confusing. Patch is ready and I'll check in the fix later tonight after running a couple more iterations of the dumps.
Bugfix checked into r51000 and code is now live. First backup to run with the new code was http://download.wikimedia.org/eswikibooks/20090526/ and eswikibooks-20090526-pages-logging.xml.gz is complete. Resolving.