Last modified: 2012-05-03 02:42:40 UTC
Since the most recent MW version upgrade, I've now had Mediawiki lose two different old versions of a file while in the process of deletion. Let me explain: * User uploads version A to Myfile.jpg. * The same user or a different user uploads version B to Myfile.jpg, overwriting the old version. * I delete Myfile.jpg. * I go to undelete version A, but when I undelete the file, version B pops up. Looking in the file history, version B is now version A, even though the resolution information (and IIRC the sha1 information) are still different. Thus version A is forever gone. You can see this occur at two files: * http://en.wikipedia.org/wiki/Special:Undelete/File:University.JPG: in this case, only one deleted version appears in the history because I was trying to perform a history split. See http://en.wikipedia.org/w/index.php?title=Special:Log&page=File%3AUniversity.JPG. However, you'll notice the deleted revision is inexplicably identitical to http://en.wikipedia.org/wiki/File:Sindh_Agriculture_University.JPG. * http://en.wikipedia.org/wiki/Special:Undelete/File:Kolkata_Tipu_Sultan%27s_Mosque3.jpg: you'll notice that the software lists two different files at two different resolutions. However, if you actually click both of the old versions, you'll get the same file (445x387: the newer one uploaded). Do I need to file a bug for this? Or is there already a bug filed?
(In reply to comment #0) > Do I need to file a bug for this? Or is there already a bug filed? Well, you filed one. ;) I don't think this has been reported yet.
Oops! I copy/pasted my post from the English Wikipedia village pump, and forgot to take out that part. :)
Marked as critical due to potential 'loss' of content.
This is marked as critical, highest priority, and sounds like a data loss problem and a 1.18 regression. Should this be assigned to someone by the bugmeister perhaps?
I have done file history splits in commons at 1.18 without such bug popping out. I suppose it wasn't your browser cache playing tricks on you?
Just tried this, can't reproduce.
lowering priority since we haven't been able to reproduce this yet. May remove 1.18 milestone
Old examples from IRC, but may not be something that is happening in the current code: <Saibo> hexmode: have three example files (may have different reasons): <Saibo> https://de.wikipedia.org/wiki/Wikipedia:Redaktion_Bilder/Archiv/2011/2#Alte_Bildversionen_weg.3F → http://commons.wikimedia.org/wiki/File:AlleeR%C3%BCgen1.jpg the two old file versions are not available <Saibo> https://commons.wikimedia.org/wiki/Commons:Forum/Archiv/2011/March#Dateiversionsschwund <Saibo> → https://commons.wikimedia.org/wiki/File:Wappen_des_Landkreises_Donau-Ries.png same Someone from wikimedia-tech channel looked to find the old versions - but wasn't successful (in Aptril 2011). <Saibo> → (a deleted file) https://de.wikipedia.org/wiki/Spezial:Wiederherstellen/Datei:Fliegenk%C3%A4fer01.JPG old, original file version is gone available - server delivers the newer (smaller) file version instead
Also note that Saibo's examples are all pre-1.18
http://commons.wikimedia.org/wiki/File:AlleeR%C3%BCgen1.jpg and https://commons.wikimedia.org/wiki/File:Wappen_des_Landkreises_Donau-Ries.png have lost revisions, but they were never deleted. For Fliegenkäfer01.JPG, you mean that the 20070806 version is not available and you get the new one instead? They have different storage keys, so that shouldn't happen. Similarly for deleted University.JPG and Sindh_Agriculture_University.JPG, the keys are different. On the other hand, Kolkata_Tipu_Sultan's_Mosque3.jpg indeed has two versions with the same storage key, so one of them is lost. I think this can be a consequence of the "image getting wrong hash" bug I reported on bug 17057#c3 (this wrong data would only produce dataloss once the file is deleted).
It doesn't happen every time; most times it doesn't happen in fact. But you will note from my example above: the SHA1 information is off, as is the pixelage argument, so you can tell I'm not just being loopy. This is in fact a bug, unless I'm badly mistaken.
(In reply to comment #11) > This is in fact a bug, unless I'm badly mistaken. It is a bug. If it is currently happening, I need examples and, preferably, a way to reproduce this. Please contact me on IRC (I'm hexmode in #wikimedia-dev, http://webchat.freenode.net/?channels=wikimedia-dev) so that we can track this down.
Aaron is going to take a shot at reproing this one.
I recommend adding a check at deletion time, when is it storing the files with the hash. If it already exists, verify that the filesizes match. If they don't refuse the deletion. That should block most dataloss, and is easy to check (manually corrupt the sha1 entry at the db).
@Mark H.: I can't reproduce it. It's only happened a few times. Sorry. @Platonides: It should be noted be noted that some hash or EXIF are *wrong* or *corrupt* on Wikipedia. I believe in happened somewhere in the 2008/9 range; unsure if the wrong data is linked with the large amount of files lost in that time period. Anyway, it's rare, but it does happen, so if a check is done, it should be done beforehand (to make sure the data is clean), and afterward (to make sure it's still clean).
Yes, we have wrong hashes (see bug 17057). If there are two different images with the same hash, one of them or both is broken, so we should at least abort and force manual intervention.
Aaron spent the better part of the day trying to repro this and look through the code, but isn't very close to solving this one. He's going to keep at it, but we're not going to let this block 1.18.
Possibly related to bug 17057
Gah. It happened again! http://en.wikipedia.org/wiki/File:Me1.jpg.
magog.the.ogre, can you describe what happened with Me1.jpg? There's nothing obviously wrong just looking at then information we see.
Dropping priority while waiting for response
Sorry about that (long response). Again, this was a page with multiple versions in history, so I performed deletion and undeletion in order to bring about a split. You will notice that the page currently has the image which is now at http://commons.wikimedia.org/wiki/File:Jasrasr_userphoto.jpg. If you look through the deleted history, you will see that version uploaded three times by User:Jasrasr, all at 80x115 7396 bytes. Now you will see an upload by MrBillTheThrill at exactly the same resolution and size as the most recent in the history. I am 85% sure that MrBill did not mean to upload an old version of that image, and that he didn't. I take my evidence because a) it would make no sense to do so in light of this edit: http://en.wikipedia.org/w/index.php?title=Carrickfergus_Grammar_School&diff=prev&oldid=120379042, b) I would probably remember it like this, and c) the page on English Wikipedia is mysteriously not reporting the duplicate on Commons under "File usage", which it usually does when they have the same hash. I will feel like an idiot if I'm wrong, but I don't think I am. Hash value reported: 3710e894f0a9a2f0d9dcbfd990aea07656100461 (per http://en.wikipedia.org/w/api.php?action=query&titles=File:Me1.jpg&prop=imageinfo&iiprop=sha1|user|size&iilimit=max) Correct hash value: 3710e894f0a9a2f0d9dcbfd990aea07656100461 (per http://en.wikipedia.org/w/api.php?action=query&titles=File:Jasrasr_userphoto.jpg&prop=imageinfo&iiprop=sha1|user|size&iilimit=max) I imagine this problem would disappear if someone were to purge the page at English Wikipedia. I am not going to do that though because I don't want to bug up the results for everyone else to see.
I agree. Jasrasr originally uploaded it as Me1.jpg. Then the later upload wrongly got the same hash as the previous version (I don't know why, but have seen it on many files). It can be seen how it's wrong by looking at the reported filesize (73 KB) and the size of the served image (7396 bytes = 7,2Kb) Google cache provides slightly more data http://webcache.googleusercontent.com/search?q=cache:WEBIjZs8xiQJ:en.wikipedia.org/wiki/File:Me1.jpg Date/Time Dimensions User Comment 20:51, 1 March 2008 80 × 115 (7 KB) Jasrasr (talk | contribs) (Reverted to version as of 05:28, 5 July 2006) 20:50, 1 March 2008 80 × 115 (7 KB) Jasrasr (talk | contribs) (Reverted to version as of 05:28, 5 July 2006) 01:09, 5 April 2007 600 × 450 (73 KB) MrBillTheThrill (talk | contribs) (Gareth Buchanan of Year 13 Thornfield performs at Pop Act 2005) 11:02, 21 September 2006 640 × 427 (265 KB) Sajidn (talk | contribs) 05:28, 5 July 2006 80 × 115 (7 KB) Jasrasr (talk | contribs) (Me) 21:21, 20 April 2006 114 × 152 (3 KB) Jdib84 (talk | contribs) (I took this picture myself for my own personal page.) We can see that the upload of MrBillTheThrill was 600 × 450 (73 KB)
A few more data: hex sha1: 3710e894f0a9a2f0d9dcbfd990aea07656100461 base36 sha1: 6fkaqblfccxi5egkgxcypzthf9d89r5 Old image entry, recovered from enwiki-20111201-image.sql.gz ('Me1.jpg',7396,80,115,'a:20:{s:4:\"Make\";s:9:\"Panasonic\";s:5:\"Model\";s:13:\"PV-GS50 \";s:11:\"Orientation\";i:1;s:11:\"XResolution\";s:4:\"72/1\";s:11:\"YResolution\";s:4:\"72/1\";s:14:\"ResolutionUnit\";i:2;s:8:\"DateTime\";s:19:\"2004:09:03 20:01:51\";s:16:\"YCbCrPositioning\";i:2;s:12:\"ExposureMode\";i:0;s:12:\"WhiteBalance\";i:0;s:16:\"SceneCaptureType\";i:0;s:12:\"ExposureTime\";s:4:\"1/60\";s:7:\"FNumber\";s:5:\"18/10\";s:11:\"ExifVersion\";s:4:\"0220\";s:16:\"DateTimeOriginal\";s:19:\"2004:09:03 20:01:51\";s:17:\"DateTimeDigitized\";s:19:\"2004:09:03 20:01:51\";s:22:\"CompressedBitsPerPixel\";s:5:\"34/10\";s:5:\"Flash\";i:0;s:10:\"ColorSpace\";i:1;s:22:\"MEDIAWIKI_EXIF_VERSION\";i:1;}',8,'BITMAP','image','jpeg', 'Reverted to version as of 05:28, 5 July 2006',1702380,'Jasrasr','20080301205128','1f94p5ba6ewoybkhsr81t5otovi7ni7') It's very interesting the sha1 of 1f94p5ba6ewoybkhsr81t5otovi7ni7, which corresponds to c302a907571f352105b726f4c314a5e937f60bf in hex. There's an entry for that file in the deleted history of Me1.jpg, so it should be possible to restore it. What does it contain? Looking at enwiki-20111201-image.sql.gz: ('Me1.jpg','20060705052850!Me1.jpg',3009,114,152,8,'I took this picture myself for my own personal page.',1290829,'Jdib84','20060420212137','0','BITMAP','image','jpeg',0,'ffifecytvu4rct5an5rzj56q0bo641e') ('Me1.jpg','20060921110230!Me1.jpg',7396,80,115,8,'Me',1702380,'Jasrasr','20060705052850','a:20:{s:4:\"Make\";s:9:\"Panasonic\";s:5:\"Model\";s:13:\"PV-GS50 \";s:11:\"Orientation\";i:1;s:11:\"XResolution\";s:4:\"72/1\";s:11:\"YResolution\";s:4:\"72/1\";s:14:\"ResolutionUnit\";i:2;s:8:\"DateTime\";s:19:\"2004:09:03 20:01:51\";s:16:\"YCbCrPositioning\";i:2;s:12:\"ExposureMode\";i:0;s:12:\"WhiteBalance\";i:0;s:16:\"SceneCaptureType\";i:0;s:12:\"ExposureTime\";s:4:\"1/60\";s:7:\"FNumber\";s:5:\"18/10\";s:11:\"ExifVersion\";s:4:\"0220\";s:16:\"DateTimeOriginal\";s:19:\"2004:09:03 20:01:51\";s:17:\"DateTimeDigitized\";s:19:\"2004:09:03 20:01:51\";s:22:\"CompressedBitsPerPixel\";s:5:\"34/10\";s:5:\"Flash\";i:0;s:10:\"ColorSpace\";i:1;s:22:\"MEDIAWIKI_EXIF_VERSION\";i:1;}','BITMAP','image','jpeg',0,'6fkaqblfccxi5egkgxcypzthf9d89r5') ('Me1.jpg','20070405010944!Me1.jpg',271384,640,427,8,'',2160909,'Sajidn','20060921110230','0','BITMAP','image','jpeg',0,'0b04y9ng82yxw5tiszewt3q8aj5r48v') ('Me1.jpg','20080301205046!Me1.jpg',74417,600,450,8,'Gareth Buchanan of Year 13 Thornfield performs at Pop Act 2005',2921689,'MrBillTheThrill','20070405010944','a:29:{s:4:\"Make\";s:4:\"SONY\";s:5:\"Model\";s:9:\"MVC-CD500\";s:11:\"Orientation\";i:1;s:11:\"XResolution\";s:12:\"720000/10000\";s:11:\"YResolution\";s:12:\"720000/10000\";s:14:\"ResolutionUnit\";i:2;s:8:\"Software\";s:27:\"Adobe Photoshop CS2 Windows\";s:8:\"DateTime\";s:19:\"2006:02:10 21:34:45\";s:16:\"YCbCrPositioning\";i:2;s:12:\"ExposureTime\";s:6:\"10/500\";s:7:\"FNumber\";s:5:\"25/10\";s:15:\"ExposureProgram\";i:2;s:15:\"ISOSpeedRatings\";i:100;s:11:\"ExifVersion\";s:4:\"0220\";s:16:\"DateTimeOriginal\";s:19:\"2005:12:20 10:34:52\";s:17:\"DateTimeDigitized\";s:19:\"2005:12:20 10:34:52\";s:22:\"CompressedBitsPerPixel\";s:3:\"4/1\";s:17:\"ExposureBiasValue\";s:4:\"0/10\";s:16:\"MaxApertureValue\";s:5:\"33/16\";s:12:\"MeteringMode\";i:5;s:11:\"LightSource\";i:0;s:5:\"Flash\";i:13;s:11:\"FocalLength\";s:6:\"158/10\";s:10:\"ColorSpace\";i:1;s:14:\"CustomRendered\";i:0;s:12:\"ExposureMode\";i:0;s:12:\"WhiteBalance\";i:0;s:16:\"SceneCaptureType\";i:0;s:22:\"MEDIAWIKI_EXIF_VERSION\";i:1;}','BITMAP','image','jpeg',0,''), ('Me1.jpg','20080301205128!Me1.jpg',7396,80,115,8,'Reverted to version as of 05:28, 5 July 2006',1702380,'Jasrasr','20080301205046','a:20:{s:4:\"Make\";s:9:\"Panasonic\";s:5:\"Model\";s:13:\"PV-GS50 \";s:11:\"Orientation\";i:1;s:11:\"XResolution\";s:4:\"72/1\";s:11:\"YResolution\";s:4:\"72/1\";s:14:\"ResolutionUnit\";i:2;s:8:\"DateTime\";s:19:\"2004:09:03 20:01:51\";s:16:\"YCbCrPositioning\";i:2;s:12:\"ExposureMode\";i:0;s:12:\"WhiteBalance\";i:0;s:16:\"SceneCaptureType\";i:0;s:12:\"ExposureTime\";s:4:\"1/60\";s:7:\"FNumber\";s:5:\"18/10\";s:11:\"ExifVersion\";s:4:\"0220\";s:16:\"DateTimeOriginal\";s:19:\"2004:09:03 20:01:51\";s:17:\"DateTimeDigitized\";s:19:\"2004:09:03 20:01:51\";s:22:\"CompressedBitsPerPixel\";s:5:\"34/10\";s:5:\"Flash\";i:0;s:10:\"ColorSpace\";i:1;s:22:\"MEDIAWIKI_EXIF_VERSION\";i:1;}','BITMAP','image','jpeg',0,'6fkaqblfccxi5egkgxcypzthf9d89r5'), The value in the db for the sha1 of MrBillTheThrill image was ''. I wonder if a purge would load it with the sha1 of the *current* image.
The data loss was inadvertently fixed in r108886. The deletion will simply fail in the case of two different files wrongly have the same SHA-1 in the DB. This basically does what comment #16 mentioned. On a related note, I also noticed that LocalFile::lock() doesn't actually lock anything (no FOR UPDATE)...
Seems to have happened again with http://en.wikipedia.org/wiki/Special:Undelete/File:MaastrichtStreet.JPG
Interestingly, this time it isn't letting me undelete the old version; it appears the error check you guys put it did stop it from doing that BUT it didn't stop the software from actually losing the file itself. :(
(In reply to comment #26) > Seems to have happened again with > http://en.wikipedia.org/wiki/Special:Undelete/File:MaastrichtStreet.JPG 23:42, 15 September 2006 . . GK tramrunner (talk | contribs | block) 1,024 × 768 (265,004 bytes) (One of the streets in Maastricht) That is the only file that should be different. Are you sure that it wasn't broke before it was deleted?
No, I'm not sure; my fault for reopening.
(In reply to comment #29) > No, I'm not sure; my fault for reopening. I noticed that they had different storage keys, meaning that FileRepo mapped the old file versions to different deleted file names. This bug is about were two different files get mapped to the same deleted file name, which previously caused data loss, since only one of them "won" and the other was just erased.