Last modified: 2012-12-19 14:17:55 UTC
I was checking through deleted revisions in the main namespace by Conversion script on the English Wikipedia, to find old deleted edits to history merge: http://en.wikipedia.org/w/index.php?limit=500&title=Special%3ADeletedContributions&target=Conversion+script&namespace=0 I found that in all pages deleted before Wikipedia was upgraded to MediaWiki 1.5 (late June 2005), all edits besides the latest one are corrupt. An undeleted example of these edits can be found above; the edits were previously at the title "Clearwater River, Idaho", and I history merged them to the existing article "Clearwater River(Idaho)". Another example involves the page about Michael Collins: http://en.wikipedia.org/w/index.php?title=Michael_Collins&dir=prev&limit=6&action=history The edits were previously at the title "Michael Collins (disambiguation)". Even though 99.9% of the text in these old deleted archives is garbage, the other 0.1% is very important page history and it should not be corrupted.
Possible external storage issue? Looks like something not getting un-gzipped or losing its flags.
I'm not sure if this is related, but some revisions before June 2005 are completely blank when they shouldn't be, as reported at this discussion on the technical village pump: http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_62#Revision content disappeared I didn't think much of it at the time, but both problems seem to involve Wikipedia text added before the upgrade to MediaWiki 1.5.
Apologies, I meant: http://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_62#Revision_content_disappeared
These deleted revisions from before June 2005 are fine: http://en.wikipedia.org/wiki/Special:Undelete/Braille_music They should stay deleted, since they were obviously nuked to make way for a page move.
This should be fixed in r55626.
It's fixed in the archive table where the MW 1.4 deleted revisions are. However the undeleted edits to "Clearwater River (Idaho)" and "Michael Collins" that I mentioned above are still corrupt. I tried deleting and undeleting them, just in case, and that didn't fix the issue. I highly doubt there are many other revisions with this problem. I'm not sure of proper protocol here : whether to re-open this bug, or start a new one ...
(In reply to comment #6) > It's fixed in the archive table where the MW 1.4 deleted revisions are. > > However the undeleted edits to "Clearwater River (Idaho)" and "Michael Collins" > that I mentioned above are still corrupt. I tried deleting and undeleting them, > just in case, and that didn't fix the issue. I highly doubt there are many > other revisions with this problem. > > I'm not sure of proper protocol here : whether to re-open this bug, or start a > new one ... Anything that was undeleted while the bug was active will now be permanently corrupted and will need to fixed manually.
Yikes, I thought as much. So ... what happens with this bug? The underlying issue is resolved but it's still caused damage that's seemingly hard to fix.
The only way to fix it is to update each corrupted row in the database, e.g. by adding manually "gzip" in the old_flags field. The problem is that it'd be very difficult to find the affected revisions automatically.
Then I'd like someone to fix the revisions I mentioned above: http://en.wikipedia.org/w/index.php?title=Clearwater_River_(Idaho)&dir=prev&limit=16&action=history and: http://en.wikipedia.org/w/index.php?title=Michael_Collins&dir=prev&limit=6&action=history As for finding other cases where it happened, for the English Wikipedia, check whether the revision ID is greater than 296,365,718 and the revision date is before July 2005, so when MW 1.4 was used. I use a revision ID of 296365718 because it's the last uncorrupted revision that I know of which was deleted that could've had this problem, see this diff: http://en.wikipedia.org/w/index.php?title=User:Xaonon&diff=2406956&oldid=296365718 As far as I know, this would work because before MW 1.5 was used, a revision got a new rev_id when it was undeleted.
Tim, do you think this is something that still can and should be recovered or just close as WONTFIX?
Realistically closing this as WONTFIX nowadays.