Last modified: 2012-08-19 19:16:20 UTC
This is related to bug 34104. It doesn't make any sense to add the hash columns and then not populate them. I'm not sure if there's a maintenance script written yet. If so, this bug can have the "shell" keyword. Otherwise, that'll need to be done first.
Yep, populateRevisionSha1 should take care of this.
IIRC, this was going to take a long time to run. We should schedule a time slot.
The script is currently running now.
(In reply to comment #3) > The script is currently running now. Orly
Just for curiosity, are there any stats on how much of this is already done? Is this expected to take days or weeks to finish? There are some revisions from February with an empty sha1 on Portuguese Wikipedia: https://pt.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=timestamp%7Csha1&titles=Matem%E1tica&rvstartid=29143621
(In reply to comment #5) > Just for curiosity, are there any stats on how much of this is already done? > Is this expected to take days or weeks to finish? > > There are some revisions from February with an empty sha1 on Portuguese > Wikipedia: > https://pt.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=timestamp%7Csha1&titles=Matem%E1tica&rvstartid=29143621 It will most certainly take weeks to finish... I'm not sure if Aaron is running it foreachwiki in turn, or doing 1 per cluster or whatever...
Ready? http://lists.wikimedia.org/pipermail/wikitech-l/2012-March/059044.html
(In reply to comment #7) > Ready? http://lists.wikimedia.org/pipermail/wikitech-l/2012-March/059044.html Nope: see the link from comment 5.
I kicked the scripts again...some of them died due to intermittent ExternalStorage problems.
The sizes don't seem to be properly generated for revisions imported from other wikis. See: http://en.wikipedia.org/w/index.php?title=Church_of_England&dir=prev&action=history The first three edits there were imported from the Nostalgia Wikipedia: http://en.wikipedia.org/w/index.php?title=Special:Log&page=Church+of+England
(In reply to comment #10) > The sizes don't seem to be properly generated for revisions imported from other > wikis. See: > http://en.wikipedia.org/w/index.php?title=Church_of_England&dir=prev&action=history > > The first three edits there were imported from the Nostalgia Wikipedia: > http://en.wikipedia.org/w/index.php?title=Special:Log&page=Church+of+England That's not related to this bug. That sounds like a rev_parent_id problem.
(In reply to comment #11) > (In reply to comment #10) > > The sizes don't seem to be properly generated for revisions imported from other > > wikis. See: > > http://en.wikipedia.org/w/index.php?title=Church_of_England&dir=prev&action=history > > > > The first three edits there were imported from the Nostalgia Wikipedia: > > http://en.wikipedia.org/w/index.php?title=Special:Log&page=Church+of+England > That's not related to this bug. That sounds like a rev_parent_id problem. That is bug 36976.
Aaron continues to run this. He often needs to restart it and babysit, but there is progress, so it'll eventually get done. It's probably not sensible to venture a guess on when it'll be done, since it'd be a wild guess, but it's probably best measured in weeks.
The script for the last remaining rev ID range just finished today.
The sha1 is still empty in some revisions of this page: https://pt.wikipedia.org/w/api.php?action=query&prop=revisions&format=jsonfm&rvprop=ids%7Ctimestamp%7Cuser%7Csize%7Csha1%7Ccomment&rvlimit=3&titles=Nabla&rvstartid=28914057
(In reply to comment #15) > The sha1 is still empty in some revisions of this page: > https://pt.wikipedia.org/w/api.php?action=query&prop=revisions&format=jsonfm&rvprop=ids%7Ctimestamp%7Cuser%7Csize%7Csha1%7Ccomment&rvlimit=3&titles=Nabla&rvstartid=28914057 13/70 for that page apparently.. mysql> select rev_id from revision where revision.rev_page = 224934 AND rev_sha1 = ''; +----------+ | rev_id | +----------+ | 10172058 | | 10183927 | | 11479284 | | 12691641 | | 12745322 | | 12745331 | | 12759878 | | 26150163 | | 26667605 | | 26806035 | | 26806040 | | 28239870 | | 28914057 | +----------+ 13 rows in set (0.00 sec)
I've started scripts to catch any stragglers. Possibly, some revs (in ID range x-y) were restored (undeleted) after the first script read a batch of revs to update (including some of ID range x-y) from a snapshot in time so it didn't catch them.
(In reply to comment #17) > I've started scripts to catch any stragglers. > > Possibly, some revs (in ID range x-y) were restored (undeleted) after the first > script read a batch of revs to update (including some of ID range x-y) from a > snapshot in time so it didn't catch them. Second run has completed on all wikis. For enwiki: * rev_sha1 and ar_sha1 population complete [12420 revision rows, 25084506 archive rows].