Last modified: 2012-05-03 02:42:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T36373, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 34373 - populateRevisionSha1.php misses archive rows, whose text is stored directly in the archive table (not text table)
populateRevisionSha1.php misses archive rows, whose text is stored directly i...
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Maintenance scripts (Other open bugs)
1.20.x
All All
: High normal (vote)
: 1.19.0 release
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 31217
  Show dependency treegraph
 
Reported: 2012-02-13 16:04 UTC by christian
Modified: 2012-05-03 02:42 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
SQL snippet showing an archive along with its text that procudes different SHA1 sums depending on the used MediaWiki version (601 bytes, text/x-sql)
2012-02-20 08:49 UTC, christian
Details
SQL snippet showing an archive along with its text that procudes different SHA1 sums depending on the used MediaWiki version (2nd try) (1.29 KB, text/x-sql)
2012-02-20 09:16 UTC, christian
Details

Description christian 2012-02-13 16:04:27 UTC
populateRevisionSha1.php omits archive rows, whose ar_rev_id is NULL [1].
Nevertherless, those omitted archive rows may need a SHA1 hash, as they may stem from MediaWiki 1.4. See tables.sql:396–398 :

  -- Old entries from 1.4 will be NULL here, and a new rev_id will
  -- be created on undeletion for those revisions.
  ar_rev_id int unsigned,

If ar_rev_id is NULL, the archived revision text is stored in column ar_text.



[1] Selection by "$idCol IS NOT NULL" in populateRevisionSha1.php:75
Comment 1 Sam Reed (reedy) 2012-02-13 16:41:14 UTC
CC'ing Aaron
Comment 2 Mark A. Hershberger 2012-02-13 18:47:21 UTC
Tagging for 1.19wmf fix even though I think we're only doing a partial rollout for sha1.

Aaron, if you don't think this is necessary, please change.
Comment 3 Aaron Schulz 2012-02-17 21:48:11 UTC
Fixed in r111795.
Comment 4 christian 2012-02-19 10:32:01 UTC
While the proposed fix does indeed add a SHA1 sum to archive rows whose text is stored directly within the archive table, the proposed fix introduces two problems:

- It breaks populating SHA1 for the rows from the revision table and from the archive table (only for rows which have their text in the text table). 
Changing the if condition from

  $this->upgradeRow( $row, $db, $table, $idCol, $prefix )

in populateRevisionSha1.php:86 to

  $this->upgradeRow( $row, $table, $idCol, $prefix )

fixes this issue.
But lacking commit access, I wonder whether I should file a separate bug for this regression.

- The SHA1 sum set with previous versions of the script for archive rows that store their text in the text table does not match the SHA1 sum that they would receive now.
By comparing the sums and experimenting with the text, it seems, the text used for SHA1 computation now gets rtrimmed.
Having some SHA1 sums for the old approach and some for the new approach in the table is not quite optimal for automatic verification ;)
Has this change been introduced on purpose?
Comment 5 Aaron Schulz 2012-02-19 19:51:18 UTC
(In reply to comment #4)
> - It breaks populating SHA1 for the rows from the revision table and from the
> archive table (only for rows which have their text in the text table). 
> Changing the if condition from

See r111881.

> - The SHA1 sum set with previous versions of the script for archive rows that
> store their text in the text table does not match the SHA1 sum that they would
> receive now.

I don't know what you mean. getRawText() returns the text as it is used by all of MediaWiki. The SHA-1 of that is the SHA-1 of the text.
Comment 6 christian 2012-02-20 08:49:02 UTC
Created attachment 10047 [details]
SQL snippet showing an archive along with its text that procudes different SHA1 sums depending on the used MediaWiki version
Comment 7 christian 2012-02-20 09:16:44 UTC
Created attachment 10049 [details]
SQL snippet showing an archive along with its text that procudes different SHA1 sums depending on the used MediaWiki version (2nd try)
Comment 8 christian 2012-02-20 10:04:40 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > - It breaks populating SHA1 for the rows from the revision table and from the
> > archive table (only for rows which have their text in the text table). 
> > Changing the if condition from
> 
> See r111881.

Great!

> > - The SHA1 sum set with previous versions of the script for archive rows that
> > store their text in the text table does not match the SHA1 sum that they would
> > receive now.
> 
> I don't know what you mean.

Upon taking a closer look at the problem, I realized the problem is more twisted than I indicated. I attached a SQL file (attachment 10049 [details]) to show the problem.

Running populateRevisionSha1.php for <r111795, the ArchiveUncompressedPlainUtf8 gets a SHA1 of t1vsj53xq35125uu7m495n0v15l0tu3.

Running populateRevisionSha1.php for >=r111795 [1], the very same row gets a SHA1 of gyh34cr8h1pice2eo0ff2krn7zsnl72.

The problem seems to be that the two archive rows of the SQL file share the same timestamp and updateRevisionSha1.php relies on the timestamps being unique.
However, I cannot see a reason for the timestamps being unique. Is there one?
If it's just an educated guess, I guess it would be good to also add the ar_title/ar_len column to reduce the risk of false SHA1 sums.



[1] Of course, one has to additionally apply the above patch removing $db from the upgradeRow call, to get the script running.
Comment 9 Aaron Schulz 2012-02-20 11:02:53 UTC
Fixed in r111920. This was a straightforward problem with trying to handle too much irregularity in one function and having a misleadingly named $idCol var.
Comment 10 christian 2012-02-20 12:38:54 UTC
Looks good for me. -> Marking resolved/fixed
Comment 11 Mark A. Hershberger 2012-02-26 19:28:09 UTC
needs to be backported to 1.19, I think

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links