Last modified: 2011-07-09 02:55:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T22757, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 20757 - Corruption of text from early 2005 due to HistoryBlobStub pointers broken by recompressTracked.php
Corruption of text from early 2005 due to HistoryBlobStub pointers broken by ...
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
History/Diffs (Other open bugs)
unspecified
All All
: Normal major with 2 votes (vote)
: ---
Assigned To: Tim Starling
http://en.wikipedia.org/wiki/Wikipedi...
:
Depends on:
Blocks: 16660
  Show dependency treegraph
 
Reported: 2009-09-21 14:32 UTC by Graham87
Modified: 2011-07-09 02:55 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Graham87 2009-09-21 14:32:12 UTC
In several articles, revision text from early 2005 appears blank when viewed in the English Wikipedia. IIRC the blank revision text appears around the time that Wikipedia changed its compression formats. This has been discussed several times on the English Wikipedia village pump; the URLs below contain plenty of examples of this problem:

http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_64#Old_versions_of_articles_missing 

http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_62#Revision_content_disappeared
Comment 1 Brion Vibber 2009-09-21 16:49:49 UTC
Tim, can you take a peek?

ISTR we cleaned up some similar items recently, where the old revs had ended up stored with incorrect compression flags which lead to them being loaded incorrectly... we might have more of such. :(
Comment 2 Platonides 2009-09-21 17:22:29 UTC
This may be the same issue faced by enciclopedia.us.es, regarding compressed revisions.
Comment 3 Church of emacs 2009-09-21 23:05:37 UTC
Might be related:
This diff says there are 2950 intermediate revisions, but there are none in the history:
http://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion%3ADickbauch&diff=25461813&oldid=18692073&uselang=en
http://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion:Dickbauch&offset=20061230130858&limit=2&action=history&uselang=en

This diff doesn't say there are any intermediate revisions, however one revision is from 2006 and another one from 2004 - there are at least hundreds of intermediate revisions.
http://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion:Dickbauch&diff=next&oldid=18692073&uselang=en
Comment 4 Church of emacs 2009-09-21 23:07:30 UTC
Oops, it seems this isn't related after all: http://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion:Raymond&diff=next&oldid=25889665&uselang=en
Sorry
Comment 5 Graham87 2009-10-27 06:43:16 UTC
Also see this current village pump discussion:
http://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(technical)&oldid=322294276#Lost_page_histories

Do you need *all* the examples of this issue to be reported here, or can a database query or some other process be used to fix all the places where this occurred, like what happend at bug 19990?


Comment 6 Graham87 2009-11-08 05:49:01 UTC
There are also a few more instances at:
http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_66

in the sections entitled "Lost page histories" (described above), "Missing revision content on Magic Knight Rayearth", and "revision history oddities". 

Also see this current discussion:

http://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(technical)&oldid=324595859
Under the section "Blank revisions, tracking them".
Comment 8 Graham87 2009-11-14 15:31:10 UTC
Yet another one:
From http://en.wikipedia.org/w/index.php?title=Physical_Layer&oldid=10138976 to http://en.wikipedia.org/w/index.php?title=Physical_Layer&oldid=13781605, excluding http://en.wikipedia.org/w/index.php?title=Physical_Layer&oldid=10145054.
Also, here's an example from March 2004, which has an unusually early date but a relatively high revision ID. It's meant to be a redirect:
http://en.wikipedia.org/w/index.php?title=Two-binary,_one-quaternary&oldid=13022503
Comment 9 Alejandro Sánchez Marín 2009-11-16 15:57:52 UTC
I have similar problem with my site http://enciclopedia.us.es. 

I use CompressOld.php over my database and has a bug when release was 1.14.x.

If you see recent changes before apply compressold dont show nothing.

On 1.14.x runs, but on > 1.15 dont. See http://encicloold.us.es. 

A patch wasnt release never.
Comment 10 Graham87 2009-11-20 04:54:04 UTC
Another one - this page history has been through a deletion and move cycle: http://en.wikipedia.org/w/index.php?title=Wiki_farm&oldid=10375881 to http://en.wikipedia.org/w/index.php?title=Wiki_farm&oldid=11244917
Comment 11 Graham87 2009-11-24 12:22:02 UTC
The problem also occurs, somewhat ironically, at this revision: http://en.wikipedia.org/w/index.php?title=Wikipedia:Usemod_article_histories&oldid=10835767
Comment 12 Graham87 2009-12-04 06:22:44 UTC
Today's featured article has this problem, from http://en.wikipedia.org/w/index.php?title=Blade_Runner&oldid=13728817 to http://en.wikipedia.org/w/index.php?title=Blade_Runner&oldid=9483290, a large number of revisions.
Comment 13 Graham87 2009-12-10 06:47:32 UTC
Another example is here, in the edits before the cut and paste move:
http://en.wikipedia.org/w/index.php?title=Eurydice_(mythology)&action=history

I won't history merge it yet, to avoid what happened in bug 19990.
Comment 14 Derk-Jan Hartman 2009-12-16 12:33:58 UTC
And one more:

All edits between http://en.wikipedia.org/w/index.php?title=The_Elephant_Man_%28film%29&oldid=9292943 (21:01, 11 January 2005) and http://en.wikipedia.org/w/index.php?title=The_Elephant_Man_%28film%29&direction=next&oldid=12776317 (02:37, 25 April 2005) got fubared. They show up as blank versions.
Comment 16 Graham87 2009-12-18 09:53:09 UTC
All edits from 2005, besides the first and the last ones from that year, are blank in this page history:
http://en.wikipedia.org/w/index.php?title=Causes_of_sexual_orientation&action=history
Comment 18 Graham87 2010-01-02 15:55:16 UTC
This bug also occurs in the CFC article, from http://en.wikipedia.org/w/index.php?title=CFC&oldid=13246201 to http://en.wikipedia.org/w/index.php?title=CFC&oldid=9452981
Comment 20 Moonriddengirl 2010-01-22 20:23:20 UTC
The bug occurs in Buckingham Palace, from
http://en.wikipedia.org/w/index.php?title=Buckingham_Palace&direction=next&oldid=9407007 to
http://en.wikipedia.org/w/index.php?diff=13139558

(The text briefly reappears in the middle of the range, [http://en.wikipedia.org/w/index.php?title=Buckingham_Palace&diff=prev&oldid=12201160], but evidently just for one edit.)

It also occurs in Norway.

http://en.wikipedia.org/w/index.php?title=Norway&diff=next&oldid=9440560 to
http://en.wikipedia.org/w/index.php?diff=13177120

We may find more instances, as we're reviewing a number of edits at http://en.wikipedia.org/wiki/Wikipedia:Contributor_copyright_investigations/Craigy144

I've asked for these to be noted.
Comment 21 Tim Starling 2010-02-08 06:49:57 UTC
This is not what I thought it was. It is a bug in recompressTracked.php. I am looking at it now. It should be recoverable.
Comment 22 Tim Starling 2010-02-08 07:35:53 UTC
OK I've checked a lot of these test cases, and they all seem to be the same, so I'm changing the summary. All of the relevant revisions should now be serving errors instead of pretending to be blank.

The original version of compressOld.php concatenated several revisions into one "blob" and stored it in a random row in the old table. Then the other old rows which needed data from the concatenated blob would get a pointer object, called a HistoryBlobStub. This pointer object gave an old_id and content hash which located the text for that revision.

After we started using external storage (ES), all the bulk data was moved out of the core database. Now, to load a HistoryBlobStub, MW would first load the old_id where the concatenated text used to be, where it would find a second pointer (with old_flags=external), then it would follow the second pointer to load the blob from ES. This was an inefficient situation, so I introduced a new pointer type (the "two-part CGZ URL") which pointed directly from the rows where the stub objects used to be, into ES. 

I then wrote a script called resolveStubs.php, and ran it, removing all HistoryBlobStub objects from the database. Or at least, that's what I thought I did. It transpires that these missing revisions above are all HistoryBlobStub objects that somehow escaped resolveStubs.php. 

The current generation of recompression script, trackBlobs/recompressTracked, has no appropriate handling for HistoryBlobStub. It leaves the HistoryBlobStub objects in place, but removes the CGZ objects they point to, creating a broken pointer. 

Due to a bug in Revision.php, the broken pointer was displayed as a blank page instead of an error message. This is fixed in r62119.

Luckily I was fairly paranoid when I wrote trackBlobs/recompressTracked, and all the data required for recovery appears to have been retained. It's just a matter of writing a bug fix script.
Comment 23 Graham87 2010-02-08 08:26:45 UTC
Thanks Tim for looking into this. I've added some text about this bug to:
http://en.wikipedia.org/wiki/MediaWiki:Missing-article



It'd be confusing to have this error message pop up when someone is checking the history of a page. Since I had to read through your explanation twice to understand, I hope that "database glitch" is OK for now as a layman's explanation.
Comment 24 Platonides 2010-02-08 22:35:54 UTC
That doesn't explain the existance of wrong ConcatenatedGzipHistoryBlob objects (the serialized mItems length doesn't match with the real one).
Perhaps they were indeed different issues :S
Comment 25 Tim Starling 2010-02-08 22:37:26 UTC
Report different issues on a separate bug report please.
Comment 26 Tim Starling 2010-02-11 02:56:12 UTC
All the test cases on the English Wikipedia should be fixed now:

* 1.3 million revisions were broken by this bug and are now fixed
* 177 revisions were unrecoverable due to being damaged by a previous compression script some years ago, while cluster4 and cluster5 were current.
* 333 revisions were unrecoverable due to the text row being missing, probably due to a bug in the original 2005 compression script. 

The fix script still needs to be run on the other wikis, so this bug has to stay open for now.
Comment 27 Phillip Patriakeas 2010-02-11 16:54:00 UTC
Are you going to provide a list of the unrecoverable revisions?
Comment 28 Tim Starling 2010-02-12 00:13:05 UTC
They're not really relevant to this bug. Maybe they are listed on some other bug report already.
Comment 29 Graham87 2010-03-31 14:42:47 UTC
Does this error message at the plasma page have anything to do with bug 20,757, or the fix for it:
http://en.wikipedia.org/w/index.php?title=Plasma&oldid=9752546

I undid a braindead history merge from "plasma" to "plasma physics", before the script was run in the English Wikipedia. Since the history merge tangled many edits together from January 2005, I wonder if my machinations at the plasma and plasma physics pages in January 2010 caused something to break.

I'm fairly sure that the above revision was visible before I untangled the history at plasma physics.
Comment 30 Ariel T. Glenn 2010-05-28 02:40:33 UTC
So Tim ran the fixup script on all other wikis on Feb 27th and none of them were affected.  I don't know if there is anything else that needs to be checked before this bug is closed, though.
Comment 31 Graham87 2010-05-28 02:58:51 UTC
The bug is almost resolved, then. I'm still curious about the problem with the plasma article that I described in comment 29; it turns out to affect all edits from 12:22, 28 January 2005 (UTC_) to 00:00, 16 April 2005 (UTC). I'd like to know whether (a) it is a result of this bug and (b) whether the affected revisions are recoverable.
Comment 32 Ariel T. Glenn 2010-05-28 06:09:52 UTC
(In reply to comment #31)

According to the fixup script, those revisions are unrecoverable. 

I had a look at a few random revisions 9752546, 11243046, 11397897 from the time period you mentioned. The text pointer for these revisions goes to a single location in cluster5, with the same id and itemid.  I seem to be able retrieve something from there manually, plugging the pointer into ExternalStore::fetchFromUrl(), but it's one text item, not a concatenated set of texts.  I can't say if your history unmerge had anything to do with it.
Comment 33 Derk-Jan Hartman 2010-05-28 12:34:58 UTC
Ariel, can you check 44320111 from bug 8689 against that list ?

Perhaps the list of unrecoverable revisions be added to the ticket or something ? That would help match any other cases we find against this problem and help finding issues that are something other than this problem.
Comment 34 Dan Collins 2011-07-09 02:55:42 UTC
It seems that between Tim and Ariel the repair scripts have been run and all test cases except the most recent one referenced to bug 8689, however that bug has been  resolved, and the referenced revision text appears to be available. Marking this as fixed?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links