Last modified: 2014-05-28 12:48:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T67765, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 65765 - populateRevisionLength.php displays "Content of archive unavailable" error when updating ar_len field with text stored in archive table
populateRevisionLength.php displays "Content of archive unavailable" error wh...
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Maintenance scripts (Other open bugs)
1.23.0
All All
: High normal (vote)
: 1.23.0 release
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-05-26 11:02 UTC by Joerg
Modified: 2014-05-28 12:48 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---
aklapper: Backport_to_Stable+


Attachments

Description Joerg 2014-05-26 11:02:44 UTC
When I run update.php while updating from 1.22 to 1.23, amongst other things the script maintenance/populateArchiveLength.php is executed to populate the ar_len column. 

However, this script does not always work correctly so that the following error occurs (code from populateArchiveLength.php):

	# Go through and update ar_len from these rows.
	foreach ( $res as $row ) {
		$rev = Revision::newFromArchiveRow( $row );
		$content = $rev->getContent();
		if ( !$content ) {
			# This should not happen, but sometimes does (bug 20757)
			$this->output( "Content of archive {$row->ar_id} unavailable!\n" );

In my case this is _not_ caused by bug 20757. Instead, this is caused by the fact that for old archived revisions, the rows in the archive table do _not_ contain a pointer to the text in the text table, but that the text is stored in the archive table directly. Obviously the above code cannot handle this situation properly.

The line

$rev = Revision::newFromArchiveRow( $row );

does not work correctly in this case. It is unable to return information from inside an archive row.
Comment 1 Joerg 2014-05-26 11:20:11 UTC
Ah, the script in fact is called populateRevisionLength.php, but the code does exactly what I posted above.
Comment 2 Mark A. Hershberger 2014-05-27 01:12:07 UTC
I'm trying to reproduce this and having trouble.  It doesn't appear to happen from a straight 1.22.0 -> 1.23.0-rc.2 update.

It would be good to have a test case that reproduces your problem:

> this is caused by the fact that for old archived revisions, the rows in the
> archive table do _not_ contain a pointer to the text in the text table, but
> that the text is stored in the archive table directly. Obviously the above
> code cannot handle this situation properly.
Comment 3 Joerg 2014-05-27 10:03:33 UTC
I am upgrading from 1.22.6 to 1.23.0-rc.2.

The database looks pretty normal to me: It has rows in the archive table from the time before the text/revision tables and I get the error message for each of these rows.

I checked the call Revision::newFromArchiveRow( $row ):

I checked var_dump($row) when the error happens; in my example for the first row in the archive table.

I noticed that at this point $row->ar_text is _NOT_ set. I think this is the root of the problem. After that, processing in Revision.php cannot recognize the old row correctly:

	if ( isset( $row->ar_text ) && !$row->ar_text_id ) {
		//We don't get here as $row->ar_text incorrectly is NULL.
		echo "pre-1.5: " . $row->ar_text;die();
		// Pre-1.5 ar_text row; get revision text.
Comment 4 Joerg 2014-05-27 10:18:26 UTC
The root of the problem is in Revision::selectArchiveFields():

Before all the things I wrote above happen, populateRevisionLength.php gets information about the archive rows with 

$ar = $this->doLenUpdates( 'archive', 'ar_id', 'ar', Revision::selectArchiveFields() );

In this call Revision::selectArchiveFields() is used to return the relevant fields from the archive table. Revision::selectArchiveFields() in fact also contains a list of fields to return, but this list does NOT include 'ar_text'. Without this field being returned, the condition 

if ( isset( $row->ar_text ) && !$row->ar_text_id ) {
	// Pre-1.5 ar_text row; get revision text.

in Revision::newFromArchiveRow() will always return FALSE, so that for old revisions the actual text is never retrieved.

Add 'ar_text' to the list of fields in Revision::selectArchiveFields() and the problem is solved.
Comment 5 Mark A. Hershberger 2014-05-27 13:03:27 UTC
Thanks for the debugging help.  I recommend that you get commit access, but I think I can solve this with what you've given me.
Comment 6 Mark A. Hershberger 2014-05-27 13:23:56 UTC
Since I can't test this, could you try the following one-line change to includes/Revision.php and tell me if the problem disappears?

diff --git a/includes/Revision.php b/includes/Revision.php
index 1ad0b4a..b88bb2e 100644
--- a/includes/Revision.php
+++ b/includes/Revision.php
@@ -444,6 +444,7 @@ class Revision implements IDBAccessObject {
 			'ar_id',
 			'ar_page_id',
 			'ar_rev_id',
+			'ar_text',
 			'ar_text_id',
 			'ar_timestamp',
 			'ar_comment',
Comment 7 Gerrit Notification Bot 2014-05-27 13:45:35 UTC
Change 135567 had a related patch set uploaded by MarkAHershberger:
Add ar_text to the list from Revision::selectArchiveFields(). It is checked later.

https://gerrit.wikimedia.org/r/135567
Comment 8 Mark A. Hershberger 2014-05-27 13:47:05 UTC
Please confirm that this patch fixes the problem you found so we can include this in the 1.23 release.
Comment 9 Joerg 2014-05-27 14:22:48 UTC
Hi Mark,

I hereby confirm that the patch fixes the problem. 

Would be great, if you could still merge this into REL1_23 before release, so that 1.23.0 does not suffer from this bug.
Comment 10 Gerrit Notification Bot 2014-05-27 15:56:13 UTC
Change 135578 had a related patch set uploaded by MarkAHershberger:
Add ar_text to the list from Revision::selectArchiveFields(). It is checked later.

https://gerrit.wikimedia.org/r/135578
Comment 11 Gerrit Notification Bot 2014-05-27 16:00:16 UTC
Change 135567 merged by jenkins-bot:
Add ar_text to the list from Revision::selectArchiveFields(). It is checked later.

https://gerrit.wikimedia.org/r/135567
Comment 12 Gerrit Notification Bot 2014-05-27 16:06:58 UTC
Change 135578 merged by jenkins-bot:
Add ar_text to the list from Revision::selectArchiveFields(). It is checked later.

https://gerrit.wikimedia.org/r/135578
Comment 13 Mark A. Hershberger 2014-05-27 16:50:35 UTC
committed on master and REL1_23

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links