Last modified: 2014-05-28 12:48:31 UTC
When I run update.php while updating from 1.22 to 1.23, amongst other things the script maintenance/populateArchiveLength.php is executed to populate the ar_len column. However, this script does not always work correctly so that the following error occurs (code from populateArchiveLength.php): # Go through and update ar_len from these rows. foreach ( $res as $row ) { $rev = Revision::newFromArchiveRow( $row ); $content = $rev->getContent(); if ( !$content ) { # This should not happen, but sometimes does (bug 20757) $this->output( "Content of archive {$row->ar_id} unavailable!\n" ); In my case this is _not_ caused by bug 20757. Instead, this is caused by the fact that for old archived revisions, the rows in the archive table do _not_ contain a pointer to the text in the text table, but that the text is stored in the archive table directly. Obviously the above code cannot handle this situation properly. The line $rev = Revision::newFromArchiveRow( $row ); does not work correctly in this case. It is unable to return information from inside an archive row.
Ah, the script in fact is called populateRevisionLength.php, but the code does exactly what I posted above.
I'm trying to reproduce this and having trouble. It doesn't appear to happen from a straight 1.22.0 -> 1.23.0-rc.2 update. It would be good to have a test case that reproduces your problem: > this is caused by the fact that for old archived revisions, the rows in the > archive table do _not_ contain a pointer to the text in the text table, but > that the text is stored in the archive table directly. Obviously the above > code cannot handle this situation properly.
I am upgrading from 1.22.6 to 1.23.0-rc.2. The database looks pretty normal to me: It has rows in the archive table from the time before the text/revision tables and I get the error message for each of these rows. I checked the call Revision::newFromArchiveRow( $row ): I checked var_dump($row) when the error happens; in my example for the first row in the archive table. I noticed that at this point $row->ar_text is _NOT_ set. I think this is the root of the problem. After that, processing in Revision.php cannot recognize the old row correctly: if ( isset( $row->ar_text ) && !$row->ar_text_id ) { //We don't get here as $row->ar_text incorrectly is NULL. echo "pre-1.5: " . $row->ar_text;die(); // Pre-1.5 ar_text row; get revision text.
The root of the problem is in Revision::selectArchiveFields(): Before all the things I wrote above happen, populateRevisionLength.php gets information about the archive rows with $ar = $this->doLenUpdates( 'archive', 'ar_id', 'ar', Revision::selectArchiveFields() ); In this call Revision::selectArchiveFields() is used to return the relevant fields from the archive table. Revision::selectArchiveFields() in fact also contains a list of fields to return, but this list does NOT include 'ar_text'. Without this field being returned, the condition if ( isset( $row->ar_text ) && !$row->ar_text_id ) { // Pre-1.5 ar_text row; get revision text. in Revision::newFromArchiveRow() will always return FALSE, so that for old revisions the actual text is never retrieved. Add 'ar_text' to the list of fields in Revision::selectArchiveFields() and the problem is solved.
Thanks for the debugging help. I recommend that you get commit access, but I think I can solve this with what you've given me.
Since I can't test this, could you try the following one-line change to includes/Revision.php and tell me if the problem disappears? diff --git a/includes/Revision.php b/includes/Revision.php index 1ad0b4a..b88bb2e 100644 --- a/includes/Revision.php +++ b/includes/Revision.php @@ -444,6 +444,7 @@ class Revision implements IDBAccessObject { 'ar_id', 'ar_page_id', 'ar_rev_id', + 'ar_text', 'ar_text_id', 'ar_timestamp', 'ar_comment',
Change 135567 had a related patch set uploaded by MarkAHershberger: Add ar_text to the list from Revision::selectArchiveFields(). It is checked later. https://gerrit.wikimedia.org/r/135567
Please confirm that this patch fixes the problem you found so we can include this in the 1.23 release.
Hi Mark, I hereby confirm that the patch fixes the problem. Would be great, if you could still merge this into REL1_23 before release, so that 1.23.0 does not suffer from this bug.
Change 135578 had a related patch set uploaded by MarkAHershberger: Add ar_text to the list from Revision::selectArchiveFields(). It is checked later. https://gerrit.wikimedia.org/r/135578
Change 135567 merged by jenkins-bot: Add ar_text to the list from Revision::selectArchiveFields(). It is checked later. https://gerrit.wikimedia.org/r/135567
Change 135578 merged by jenkins-bot: Add ar_text to the list from Revision::selectArchiveFields(). It is checked later. https://gerrit.wikimedia.org/r/135578
committed on master and REL1_23