Last modified: 2012-05-03 02:42:40 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T33792, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 31792 - Mediawiki losing old file versions upon undeletion in MW 1.18
Mediawiki losing old file versions upon undeletion in MW 1.18
Status: RESOLVED WORKSFORME
Product: MediaWiki
Classification: Unclassified
File management (Other open bugs)
1.18.x
All All
: Normal critical (vote)
: 1.19.0 release
Assigned To: Aaron Schulz
: platformeng
Depends on:
Blocks: 31217
  Show dependency treegraph
 
Reported: 2011-10-18 03:15 UTC by magog.the.ogre
Modified: 2012-05-03 02:42 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description magog.the.ogre 2011-10-18 03:15:40 UTC
Since the most recent MW version upgrade, I've now had Mediawiki lose two different old versions of a file while in the process of deletion. Let me explain:

* User uploads version A to Myfile.jpg.
* The same user or a different user uploads version B to Myfile.jpg, overwriting the old version.
* I delete Myfile.jpg.
* I go to undelete version A, but when I undelete the file, version B pops up. 

Looking in the file history, version B is now version A, even though the resolution information (and IIRC the sha1 information) are still different. Thus version A is forever gone.

You can see this occur at two files:
* http://en.wikipedia.org/wiki/Special:Undelete/File:University.JPG: in this case, only one deleted version appears in the history because I was trying to perform a history split. See http://en.wikipedia.org/w/index.php?title=Special:Log&page=File%3AUniversity.JPG. However, you'll notice the deleted revision is inexplicably identitical to http://en.wikipedia.org/wiki/File:Sindh_Agriculture_University.JPG.
* http://en.wikipedia.org/wiki/Special:Undelete/File:Kolkata_Tipu_Sultan%27s_Mosque3.jpg: you'll notice that the software lists two different files at two different resolutions. However, if you actually click both of the old versions, you'll get the same file (445x387: the newer one uploaded). 

Do I need to file a bug for this? Or is there already a bug filed?
Comment 1 Mark A. Hershberger 2011-10-19 12:06:57 UTC
(In reply to comment #0)
> Do I need to file a bug for this? Or is there already a bug filed?

Well, you filed one. ;)

I don't think this has been reported yet.
Comment 2 magog.the.ogre 2011-10-20 06:41:26 UTC
Oops! I copy/pasted my post from the English Wikipedia village pump, and forgot to take out that part. :)
Comment 3 Derk-Jan Hartman 2011-10-22 17:28:00 UTC
Marked as critical due to potential 'loss' of content.
Comment 4 Brion Vibber 2011-11-04 18:37:09 UTC
This is marked as critical, highest priority, and sounds like a data loss problem and a 1.18 regression.

Should this be assigned to someone by the bugmeister perhaps?
Comment 5 Platonides 2011-11-04 18:43:27 UTC
I have done file history splits in commons at 1.18 without such bug popping out.
I suppose it wasn't your browser cache playing tricks on you?
Comment 6 Bryan Tong Minh 2011-11-04 19:05:56 UTC
Just tried this, can't reproduce.
Comment 7 Mark A. Hershberger 2011-11-04 19:12:31 UTC
lowering priority since we haven't been able to reproduce this yet.  May remove 1.18 milestone
Comment 8 Mark A. Hershberger 2011-11-04 19:39:50 UTC
Old examples from IRC, but may not be something that is happening in
the current code:

<Saibo> hexmode: have three example files (may have different reasons):
<Saibo>
        https://de.wikipedia.org/wiki/Wikipedia:Redaktion_Bilder/Archiv/2011/2#Alte_Bildversionen_weg.3Fhttp://commons.wikimedia.org/wiki/File:AlleeR%C3%BCgen1.jpg the two
        old file versions are not available
<Saibo>
        https://commons.wikimedia.org/wiki/Commons:Forum/Archiv/2011/March#Dateiversionsschwund
<Saibo> →
        https://commons.wikimedia.org/wiki/File:Wappen_des_Landkreises_Donau-Ries.png
        same  Someone from wikimedia-tech channel looked to find the old
        versions - but wasn't successful (in Aptril 2011).
<Saibo> → (a deleted file)
        https://de.wikipedia.org/wiki/Spezial:Wiederherstellen/Datei:Fliegenk%C3%A4fer01.JPG
        old, original file version is gone available - server delivers the
        newer (smaller) file version instead
Comment 9 Mark A. Hershberger 2011-11-04 19:40:50 UTC
Also note that Saibo's examples are all pre-1.18
Comment 10 Platonides 2011-11-06 23:35:52 UTC
http://commons.wikimedia.org/wiki/File:AlleeR%C3%BCgen1.jpg and
https://commons.wikimedia.org/wiki/File:Wappen_des_Landkreises_Donau-Ries.png have lost revisions, but they were never deleted.

For Fliegenkäfer01.JPG, you mean that the 20070806 version is not available and you get the new one instead? They have different storage keys, so that shouldn't happen.
Similarly for deleted University.JPG and Sindh_Agriculture_University.JPG, the keys are different.

On the other hand, Kolkata_Tipu_Sultan's_Mosque3.jpg indeed has two versions with the same storage key, so one of them is lost.
I think this can  be a consequence of the "image getting wrong hash" bug I reported on bug 17057#c3 (this wrong data would only produce dataloss once the file is deleted).
Comment 11 magog.the.ogre 2011-11-07 06:29:43 UTC
It doesn't happen every time; most times it doesn't happen in fact. But you will note from my example above: the SHA1 information is off, as is the pixelage argument, so you can tell I'm not just being loopy. This is in fact a bug, unless I'm badly mistaken.
Comment 12 Mark A. Hershberger 2011-11-07 14:35:55 UTC
(In reply to comment #11)
> This is in fact a bug, unless I'm badly mistaken.

It is a bug.  If it is currently happening, I need examples and, preferably, a way to reproduce this.  Please contact me on IRC (I'm hexmode in #wikimedia-dev, http://webchat.freenode.net/?channels=wikimedia-dev) so that we can track this down.
Comment 13 Rob Lanphier 2011-11-14 22:34:39 UTC
Aaron is going to take a shot at reproing this one.
Comment 14 Platonides 2011-11-14 23:32:24 UTC
I recommend adding a check at deletion time, when is it storing the files with the hash. If it already exists, verify that the filesizes match. If they don't refuse the deletion.
That should block most dataloss, and is easy to check (manually corrupt the sha1 entry at the db).
Comment 15 magog.the.ogre 2011-11-15 18:26:47 UTC
@Mark H.: I can't reproduce it. It's only happened a few times. Sorry.

@Platonides: It should be noted be noted that some hash or EXIF are *wrong* or *corrupt* on Wikipedia. I believe in happened somewhere in the 2008/9 range; unsure if the wrong data is linked with the large amount of files lost in that time period. Anyway, it's rare, but it does happen, so if a check is done, it should be done beforehand (to make sure the data is clean), and afterward (to make sure it's still clean).
Comment 16 Platonides 2011-11-15 21:59:02 UTC
Yes, we have wrong hashes (see bug 17057). If there are two different images with the same hash, one of them or both is broken, so we should at least abort and force manual intervention.
Comment 17 Rob Lanphier (RobLa) 2011-11-16 06:36:41 UTC
Aaron spent the better part of the day trying to repro this and look through the code, but isn't very close to solving this one.  He's going to keep at it, but we're not going to let this block 1.18.
Comment 18 Rob Lanphier 2011-12-03 00:06:04 UTC
Possibly related to bug 17057
Comment 19 magog.the.ogre 2011-12-09 09:02:21 UTC
Gah. It happened again! http://en.wikipedia.org/wiki/File:Me1.jpg.
Comment 20 Rob Lanphier 2011-12-15 19:32:53 UTC
magog.the.ogre, can you describe what happened with Me1.jpg?  There's nothing obviously wrong just looking at then information we see.
Comment 21 Rob Lanphier 2011-12-19 22:24:11 UTC
Dropping priority while waiting for response
Comment 22 magog.the.ogre 2011-12-20 21:08:12 UTC
Sorry about that (long response). Again, this was a page with multiple versions in history, so I performed deletion and undeletion in order to bring about a split.

You will notice that the page currently has the image which is now at http://commons.wikimedia.org/wiki/File:Jasrasr_userphoto.jpg. If you look through the deleted history, you will see that version uploaded three times by User:Jasrasr, all at 80x115 7396 bytes.

Now you will see an upload by MrBillTheThrill at exactly the same resolution and size as the most recent in the history. I am 85% sure that MrBill did not mean to upload an old version of that image, and that he didn't. I take my evidence because a) it would make no sense to do so in light of this edit: http://en.wikipedia.org/w/index.php?title=Carrickfergus_Grammar_School&diff=prev&oldid=120379042, b) I would probably remember it like this, and c) the page on English Wikipedia is mysteriously not reporting the duplicate on Commons under "File usage", which it usually does when they have the same hash. I will feel like an idiot if I'm wrong, but I don't think I am.

Hash value reported: 3710e894f0a9a2f0d9dcbfd990aea07656100461 (per http://en.wikipedia.org/w/api.php?action=query&titles=File:Me1.jpg&prop=imageinfo&iiprop=sha1|user|size&iilimit=max)
Correct hash value: 3710e894f0a9a2f0d9dcbfd990aea07656100461 (per http://en.wikipedia.org/w/api.php?action=query&titles=File:Jasrasr_userphoto.jpg&prop=imageinfo&iiprop=sha1|user|size&iilimit=max)

I imagine this problem would disappear if someone were to purge the page at English Wikipedia. I am not going to do that though because I don't want to bug up the results for everyone else to see.
Comment 23 Platonides 2011-12-20 23:44:47 UTC
I agree. Jasrasr originally uploaded it as Me1.jpg. Then the later upload wrongly got the same hash as the previous version (I don't know why, but have seen it on many files).

It can be seen how it's wrong by looking at the reported filesize (73 KB) and the size of the served image (7396 bytes = 7,2Kb)

Google cache provides slightly more data http://webcache.googleusercontent.com/search?q=cache:WEBIjZs8xiQJ:en.wikipedia.org/wiki/File:Me1.jpg

Date/Time		Dimensions	User	Comment
20:51, 1 March 2008	80 × 115 (7 KB)	Jasrasr (talk | contribs)	(Reverted to version as of 05:28, 5 July 2006)

20:50, 1 March 2008	80 × 115 (7 KB)	Jasrasr (talk | contribs)	(Reverted to version as of 05:28, 5 July 2006)

01:09, 5 April 2007	600 × 450 (73 KB)	MrBillTheThrill (talk | contribs)	(Gareth Buchanan of Year 13 Thornfield performs at Pop Act 2005)

11:02, 21 September 2006	640 × 427 (265 KB)	Sajidn (talk | contribs)	

05:28, 5 July 2006	80 × 115 (7 KB)	Jasrasr (talk | contribs)	(Me)

21:21, 20 April 2006	114 × 152 (3 KB)	Jdib84 (talk | contribs)	(I took this picture myself for my own personal page.)

We can see that the upload of MrBillTheThrill was 600 × 450 (73 KB)
Comment 24 Platonides 2011-12-21 00:05:34 UTC
A few more data:
hex sha1: 3710e894f0a9a2f0d9dcbfd990aea07656100461
base36 sha1: 6fkaqblfccxi5egkgxcypzthf9d89r5

Old image entry, recovered from enwiki-20111201-image.sql.gz 

('Me1.jpg',7396,80,115,'a:20:{s:4:\"Make\";s:9:\"Panasonic\";s:5:\"Model\";s:13:\"PV-GS50      \";s:11:\"Orientation\";i:1;s:11:\"XResolution\";s:4:\"72/1\";s:11:\"YResolution\";s:4:\"72/1\";s:14:\"ResolutionUnit\";i:2;s:8:\"DateTime\";s:19:\"2004:09:03 20:01:51\";s:16:\"YCbCrPositioning\";i:2;s:12:\"ExposureMode\";i:0;s:12:\"WhiteBalance\";i:0;s:16:\"SceneCaptureType\";i:0;s:12:\"ExposureTime\";s:4:\"1/60\";s:7:\"FNumber\";s:5:\"18/10\";s:11:\"ExifVersion\";s:4:\"0220\";s:16:\"DateTimeOriginal\";s:19:\"2004:09:03 20:01:51\";s:17:\"DateTimeDigitized\";s:19:\"2004:09:03 20:01:51\";s:22:\"CompressedBitsPerPixel\";s:5:\"34/10\";s:5:\"Flash\";i:0;s:10:\"ColorSpace\";i:1;s:22:\"MEDIAWIKI_EXIF_VERSION\";i:1;}',8,'BITMAP','image','jpeg',
'Reverted to version as of 05:28, 5 July 2006',1702380,'Jasrasr','20080301205128','1f94p5ba6ewoybkhsr81t5otovi7ni7')

It's very interesting the sha1 of 1f94p5ba6ewoybkhsr81t5otovi7ni7, which corresponds to c302a907571f352105b726f4c314a5e937f60bf in hex.

There's an entry for that file in the deleted history of Me1.jpg, so it should be possible to restore it. What does it contain?


Looking at enwiki-20111201-image.sql.gz:
 
('Me1.jpg','20060705052850!Me1.jpg',3009,114,152,8,'I took this picture myself for my own personal page.',1290829,'Jdib84','20060420212137','0','BITMAP','image','jpeg',0,'ffifecytvu4rct5an5rzj56q0bo641e')

('Me1.jpg','20060921110230!Me1.jpg',7396,80,115,8,'Me',1702380,'Jasrasr','20060705052850','a:20:{s:4:\"Make\";s:9:\"Panasonic\";s:5:\"Model\";s:13:\"PV-GS50      \";s:11:\"Orientation\";i:1;s:11:\"XResolution\";s:4:\"72/1\";s:11:\"YResolution\";s:4:\"72/1\";s:14:\"ResolutionUnit\";i:2;s:8:\"DateTime\";s:19:\"2004:09:03 20:01:51\";s:16:\"YCbCrPositioning\";i:2;s:12:\"ExposureMode\";i:0;s:12:\"WhiteBalance\";i:0;s:16:\"SceneCaptureType\";i:0;s:12:\"ExposureTime\";s:4:\"1/60\";s:7:\"FNumber\";s:5:\"18/10\";s:11:\"ExifVersion\";s:4:\"0220\";s:16:\"DateTimeOriginal\";s:19:\"2004:09:03 20:01:51\";s:17:\"DateTimeDigitized\";s:19:\"2004:09:03 20:01:51\";s:22:\"CompressedBitsPerPixel\";s:5:\"34/10\";s:5:\"Flash\";i:0;s:10:\"ColorSpace\";i:1;s:22:\"MEDIAWIKI_EXIF_VERSION\";i:1;}','BITMAP','image','jpeg',0,'6fkaqblfccxi5egkgxcypzthf9d89r5')

('Me1.jpg','20070405010944!Me1.jpg',271384,640,427,8,'',2160909,'Sajidn','20060921110230','0','BITMAP','image','jpeg',0,'0b04y9ng82yxw5tiszewt3q8aj5r48v')

('Me1.jpg','20080301205046!Me1.jpg',74417,600,450,8,'Gareth Buchanan of Year 13 Thornfield performs at Pop Act 2005',2921689,'MrBillTheThrill','20070405010944','a:29:{s:4:\"Make\";s:4:\"SONY\";s:5:\"Model\";s:9:\"MVC-CD500\";s:11:\"Orientation\";i:1;s:11:\"XResolution\";s:12:\"720000/10000\";s:11:\"YResolution\";s:12:\"720000/10000\";s:14:\"ResolutionUnit\";i:2;s:8:\"Software\";s:27:\"Adobe Photoshop CS2 Windows\";s:8:\"DateTime\";s:19:\"2006:02:10 21:34:45\";s:16:\"YCbCrPositioning\";i:2;s:12:\"ExposureTime\";s:6:\"10/500\";s:7:\"FNumber\";s:5:\"25/10\";s:15:\"ExposureProgram\";i:2;s:15:\"ISOSpeedRatings\";i:100;s:11:\"ExifVersion\";s:4:\"0220\";s:16:\"DateTimeOriginal\";s:19:\"2005:12:20 10:34:52\";s:17:\"DateTimeDigitized\";s:19:\"2005:12:20 10:34:52\";s:22:\"CompressedBitsPerPixel\";s:3:\"4/1\";s:17:\"ExposureBiasValue\";s:4:\"0/10\";s:16:\"MaxApertureValue\";s:5:\"33/16\";s:12:\"MeteringMode\";i:5;s:11:\"LightSource\";i:0;s:5:\"Flash\";i:13;s:11:\"FocalLength\";s:6:\"158/10\";s:10:\"ColorSpace\";i:1;s:14:\"CustomRendered\";i:0;s:12:\"ExposureMode\";i:0;s:12:\"WhiteBalance\";i:0;s:16:\"SceneCaptureType\";i:0;s:22:\"MEDIAWIKI_EXIF_VERSION\";i:1;}','BITMAP','image','jpeg',0,''),

('Me1.jpg','20080301205128!Me1.jpg',7396,80,115,8,'Reverted to version as of 05:28, 5 July 2006',1702380,'Jasrasr','20080301205046','a:20:{s:4:\"Make\";s:9:\"Panasonic\";s:5:\"Model\";s:13:\"PV-GS50      \";s:11:\"Orientation\";i:1;s:11:\"XResolution\";s:4:\"72/1\";s:11:\"YResolution\";s:4:\"72/1\";s:14:\"ResolutionUnit\";i:2;s:8:\"DateTime\";s:19:\"2004:09:03 20:01:51\";s:16:\"YCbCrPositioning\";i:2;s:12:\"ExposureMode\";i:0;s:12:\"WhiteBalance\";i:0;s:16:\"SceneCaptureType\";i:0;s:12:\"ExposureTime\";s:4:\"1/60\";s:7:\"FNumber\";s:5:\"18/10\";s:11:\"ExifVersion\";s:4:\"0220\";s:16:\"DateTimeOriginal\";s:19:\"2004:09:03 20:01:51\";s:17:\"DateTimeDigitized\";s:19:\"2004:09:03 20:01:51\";s:22:\"CompressedBitsPerPixel\";s:5:\"34/10\";s:5:\"Flash\";i:0;s:10:\"ColorSpace\";i:1;s:22:\"MEDIAWIKI_EXIF_VERSION\";i:1;}','BITMAP','image','jpeg',0,'6fkaqblfccxi5egkgxcypzthf9d89r5'),

The value in the db for the sha1 of MrBillTheThrill image was ''. I wonder if a purge would load it with the sha1 of the *current* image.
Comment 25 Aaron Schulz 2012-01-15 23:59:50 UTC
The data loss was inadvertently fixed in r108886. The deletion will simply fail in the case of two different files wrongly have the same SHA-1 in the DB. This basically does what comment #16 mentioned.

On a related note, I also noticed that LocalFile::lock() doesn't actually lock anything (no FOR UPDATE)...
Comment 26 magog.the.ogre 2012-04-24 23:30:08 UTC
Seems to have happened again with http://en.wikipedia.org/wiki/Special:Undelete/File:MaastrichtStreet.JPG
Comment 27 magog.the.ogre 2012-04-24 23:33:27 UTC
Interestingly, this time it isn't letting me undelete the old version; it appears the error check you guys put it did stop it from doing that BUT it didn't stop the software from actually losing the file itself. :(
Comment 28 Aaron Schulz 2012-04-25 00:58:59 UTC
(In reply to comment #26)
> Seems to have happened again with
> http://en.wikipedia.org/wiki/Special:Undelete/File:MaastrichtStreet.JPG

23:42, 15 September 2006 . . GK tramrunner (talk | contribs | block) 1,024 × 768 (265,004 bytes) (One of the streets in Maastricht)

That is the only file that should be different. Are you sure that it wasn't broke before it was deleted?
Comment 29 magog.the.ogre 2012-04-25 01:16:15 UTC
No, I'm not sure; my fault for reopening.
Comment 30 Aaron Schulz 2012-04-25 06:59:32 UTC
(In reply to comment #29)
> No, I'm not sure; my fault for reopening.

I noticed that they had different storage keys, meaning that FileRepo mapped the old file versions to different deleted file names.

This bug is about were two different files get mapped to the same deleted file name, which previously caused data loss, since only one of them "won" and the other was just erased.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links