Last modified: 2012-08-19 19:16:20 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T38081, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 36081 - Populate revision.rev_sha1 and archive.ar_sha1 on Wikimedia wikis
Populate revision.rev_sha1 and archive.ar_sha1 on Wikimedia wikis
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: shell
Depends on:
Blocks: 29782
  Show dependency treegraph
 
Reported: 2012-04-18 23:14 UTC by MZMcBride
Modified: 2012-08-19 19:16 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description MZMcBride 2012-04-18 23:14:12 UTC
This is related to bug 34104. It doesn't make any sense to add the hash columns and then not populate them.

I'm not sure if there's a maintenance script written yet. If so, this bug can have the "shell" keyword. Otherwise, that'll need to be done first.
Comment 1 Chad H. 2012-04-18 23:20:03 UTC
Yep, populateRevisionSha1 should take care of this.
Comment 2 Mark A. Hershberger 2012-04-19 00:37:19 UTC
IIRC, this was going to take a long time to run.  We should schedule a time slot.
Comment 3 Aaron Schulz 2012-04-24 23:20:20 UTC
The script is currently running now.
Comment 4 Sam Reed (reedy) 2012-04-24 23:23:59 UTC
(In reply to comment #3)
> The script is currently running now.

Orly
Comment 5 Helder 2012-04-27 16:35:13 UTC
Just for curiosity, are there any stats on how much of this is already done?
Is this expected to take days or weeks to finish?

There are some revisions from February with an empty sha1 on Portuguese Wikipedia:
https://pt.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=timestamp%7Csha1&titles=Matem%E1tica&rvstartid=29143621
Comment 6 Sam Reed (reedy) 2012-05-03 18:49:20 UTC
(In reply to comment #5)
> Just for curiosity, are there any stats on how much of this is already done?
> Is this expected to take days or weeks to finish?
> 
> There are some revisions from February with an empty sha1 on Portuguese
> Wikipedia:
> https://pt.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=timestamp%7Csha1&titles=Matem%E1tica&rvstartid=29143621

It will most certainly take weeks to finish... I'm not sure if Aaron is running it foreachwiki in turn, or doing 1 per cluster or whatever...
Comment 7 db [inactive,noenotif] 2012-05-05 19:44:12 UTC
Ready? http://lists.wikimedia.org/pipermail/wikitech-l/2012-March/059044.html
Comment 8 Helder 2012-05-05 19:51:57 UTC
(In reply to comment #7)
> Ready? http://lists.wikimedia.org/pipermail/wikitech-l/2012-March/059044.html

Nope: see the link from comment 5.
Comment 9 Aaron Schulz 2012-05-22 20:27:49 UTC
I kicked the scripts again...some of them died due to intermittent ExternalStorage problems.
Comment 10 Graham87 2012-06-08 03:30:32 UTC
The sizes don't seem to be properly generated for revisions imported from other wikis. See:
http://en.wikipedia.org/w/index.php?title=Church_of_England&dir=prev&action=history

The first three edits there were imported from the Nostalgia Wikipedia:
http://en.wikipedia.org/w/index.php?title=Special:Log&page=Church+of+England
Comment 11 Aaron Schulz 2012-06-08 06:23:33 UTC
(In reply to comment #10)
> The sizes don't seem to be properly generated for revisions imported from other
> wikis. See:
> http://en.wikipedia.org/w/index.php?title=Church_of_England&dir=prev&action=history
> 
> The first three edits there were imported from the Nostalgia Wikipedia:
> http://en.wikipedia.org/w/index.php?title=Special:Log&page=Church+of+England

That's not related to this bug. That sounds like a rev_parent_id problem.
Comment 12 db [inactive,noenotif] 2012-06-08 09:23:30 UTC
(In reply to comment #11)
> (In reply to comment #10)
> > The sizes don't seem to be properly generated for revisions imported from other
> > wikis. See:
> > http://en.wikipedia.org/w/index.php?title=Church_of_England&dir=prev&action=history
> > 
> > The first three edits there were imported from the Nostalgia Wikipedia:
> > http://en.wikipedia.org/w/index.php?title=Special:Log&page=Church+of+England
> That's not related to this bug. That sounds like a rev_parent_id problem.

That is bug 36976.
Comment 13 Rob Lanphier 2012-06-15 18:53:25 UTC
Aaron continues to run this.  He often needs to restart it and babysit, but there is progress, so it'll eventually get done.  It's probably not sensible to venture a guess on when it'll be done, since it'd be a wild guess, but it's probably best measured in weeks.
Comment 14 Aaron Schulz 2012-08-11 04:21:08 UTC
The script for the last remaining rev ID range just finished today.
Comment 16 Sam Reed (reedy) 2012-08-12 23:32:42 UTC
(In reply to comment #15)
> The sha1 is still empty in some revisions of this page:
> https://pt.wikipedia.org/w/api.php?action=query&prop=revisions&format=jsonfm&rvprop=ids%7Ctimestamp%7Cuser%7Csize%7Csha1%7Ccomment&rvlimit=3&titles=Nabla&rvstartid=28914057

13/70 for that page apparently..

mysql> select rev_id from revision where revision.rev_page = 224934 AND rev_sha1 = '';
+----------+
| rev_id   |
+----------+
| 10172058 |
| 10183927 |
| 11479284 |
| 12691641 |
| 12745322 |
| 12745331 |
| 12759878 |
| 26150163 |
| 26667605 |
| 26806035 |
| 26806040 |
| 28239870 |
| 28914057 |
+----------+
13 rows in set (0.00 sec)
Comment 17 Aaron Schulz 2012-08-13 00:20:09 UTC
I've started scripts to catch any stragglers.

Possibly, some revs (in ID range x-y) were restored (undeleted) after the first script read a batch of revs to update (including some of ID range x-y) from a snapshot in time so it didn't catch them.
Comment 18 Aaron Schulz 2012-08-19 19:16:20 UTC
(In reply to comment #17)
> I've started scripts to catch any stragglers.
> 
> Possibly, some revs (in ID range x-y) were restored (undeleted) after the first
> script read a batch of revs to update (including some of ID range x-y) from a
> snapshot in time so it didn't catch them.

Second run has completed on all wikis. For enwiki:
* rev_sha1 and ar_sha1 population complete [12420 revision rows, 25084506 archive rows].

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links