Last modified: 2014-05-14 21:25:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T19057, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 17057 - Images with wrong SHA1
Images with wrong SHA1
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Media storage (Other open bugs)
unspecified
All All
: Low major (vote)
: ---
Assigned To: Nobody - You can work on this!
: shell
: 17070 49841 (view as bug list)
Depends on:
Blocks: 16660 23529 34755
  Show dependency treegraph
 
Reported: 2009-01-17 20:56 UTC by MZMcBride
Modified: 2014-05-14 21:25 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description MZMcBride 2009-01-17 20:56:53 UTC
Using enwiki_p, I'm getting data like this:

mysql> SELECT DISTINCT enwiki_p.page.page_title, commonswiki_p.image.img_name
    -> FROM enwiki_p.image, commonswiki_p.image, enwiki_p.categorylinks, enwiki_p.page
    -> WHERE enwiki_p.image.img_sha1 = commonswiki_p.image.img_sha1
    -> AND enwiki_p.page.page_title = enwiki_p.image.img_name
    -> AND enwiki_p.categorylinks.cl_from = enwiki_p.page.page_id
    -> AND enwiki_p.categorylinks.cl_to = 'All_non-free_media'
    -> LIMIT 50;
+----------------+-----------------------------------------------+
| page_title     | img_name                                      |
+----------------+-----------------------------------------------+
| Imas360_10.jpg | +-_of_Led.svg                                 | 
| Imas360_10.jpg | 5von10.png                                    | 
| Imas360_10.jpg | Alfred_de_Musset.jpg                          | 
| Imas360_10.jpg | Amphipodredkils.jpg                           | 
| Imas360_10.jpg | Amphoe_6502.png                               | 
| Imas360_10.jpg | Aschenbecher_mit_Mechanik1.jpg                | 
| Imas360_10.jpg | Austria_1945-55.png                           | 
| Imas360_10.jpg | Bakaiku.JPG                                   | 
| Imas360_10.jpg | Bakweri_cocoyam_farmer_from_Cameroon.jpg      | 
| Imas360_10.jpg | Bartolomeu_Dias_Voyage.PNG                    | 
| Imas360_10.jpg | Benjamin_West.jpg                             | 
| Imas360_10.jpg | Blason-fr-en-Saint-Moreil.svg                 | 
| Imas360_10.jpg | Brno-Nový_Lískovec_from_Petrov_(Brno).JPG   | 
| Imas360_10.jpg | Brännkyrka_kyrka_2005-09-04nr1.jpg           | 
| Imas360_10.jpg | Bundesautobahn_113_number.svg                 | 
| Imas360_10.jpg | Clock_UT+7.png                                | 
| Imas360_10.jpg | Coat_of_Arms_of_Antigua_and_Barbuda.gif       | 
| Imas360_10.jpg | Codex_egberti_-_egbert.jpg                    | 
| Imas360_10.jpg | Cold_fingers.png                              | 
| Imas360_10.jpg | Cross.png                                     | 
| Imas360_10.jpg | Cutty_sark_October_2003.jpg                   | 
| Imas360_10.jpg | DNAn+1_C.svg                                  | 
| Imas360_10.jpg | DNAn+1_T.svg                                  | 
| Imas360_10.jpg | Dabrowskirynek.jpg                            | 
| Imas360_10.jpg | Dalmenyhouse_lighter.jpg                      | 
| Imas360_10.jpg | EtaCarinae.jpg                                | 
| Imas360_10.jpg | Europe_location_ARM.png                       | 
| Imas360_10.jpg | Five-pointed_star.svg                         | 
| Imas360_10.jpg | Flag_of_Kentucky.svg                          | 
| Imas360_10.jpg | Font_Wallace_Pt_Pasteur.jpg                   | 
| Imas360_10.jpg | GeorgeWBush.jpg                               | 
| Imas360_10.jpg | Gorillas_2609.jpg                             | 
| Imas360_10.jpg | Hallingkast.jpg                               | 
| Imas360_10.jpg | Harlekin_Columbine_Tivoli_Denmark.jpg         | 
| Imas360_10.jpg | Helicopter_rescue_sancy_takeoff.jpg           | 
| Imas360_10.jpg | Herb_Korybut.jpg                              | 
| Imas360_10.jpg | Hymenoptera_diagonal.jpg                      | 
| Imas360_10.jpg | IsleofWightmap_1945.jpg                       | 
| Imas360_10.jpg | Jarzabczy_Wierch_a2.jpg                       | 
| Imas360_10.jpg | Karte_Lage_Kanton_Uri.png                     | 
| Imas360_10.jpg | Kit_body_scga06.png                           | 
| Imas360_10.jpg | Kościół_Wniebowstąpienia_Poznań003.jpg   | 
| Imas360_10.jpg | Lilium_bulbiferum_mg-k.jpg                    | 
| Imas360_10.jpg | Macaronesia.jpg                               | 
| Imas360_10.jpg | Maisonmaton.jpg                               | 
| Imas360_10.jpg | Map_of_Scotland_within_the_United_Kingdom.png | 
| Imas360_10.jpg | Market_Square_Shopping_Centre_Geelong.jpg     | 
| Imas360_10.jpg | Mg-TableImage.svg                             | 
| Imas360_10.jpg | Michael_Boogerd.jpg                           | 
| Imas360_10.jpg | Monarch_caterpillar_and_egg.jpg               | 
+----------------+-----------------------------------------------+
50 rows in set (0.07 sec)

The hashes are identical according to the query, so this suggests that something is very broken.

I've been told that null editing the pages can fix the hash, though it's difficult to test with replag.
Comment 1 Roan Kattouw 2009-01-17 21:10:39 UTC
Re-uploading one of these files ([[File:5von10.png]]) over itself fixes this (null edit and purge don't). This means hashing was broken but isn't anymore, so it's fixable by 'just' recalculating them (is there a maintenance script for that?).
Comment 2 Brion Vibber 2009-01-31 01:34:43 UTC
Adding to database cleanup tracking bug 16660.
Comment 3 Platonides 2009-02-04 23:53:56 UTC
I have been studying the hashes of commons images.
The errors can be sorted very clearly.
There're images where the hash of an older version got 'stuck'. The image sha1 wasn't updated on reuploading?
With the hash of the empty string it is much more common. It's normal that when the image had a broken version it got the empty hash, but it keeps on the current version, even through several uploads. Something which happens less with normal images.

There's however a worse case, where the metadata is right but the old version listed is not there. There's a file as history but its contents are the same as the current version (or another newer version). *The old version was silently lost*.
So not only should they be searched on backups, but we must make sure that whatever bug produced it is fixed.

Images using hash of older version:
http://commons.wikimedia.org/wiki/File:Agrigento-Domestic-Quarter-flickr.jpg	Use hash of older version (77d4c7822a2f1e971d8cc7cf9b4b56a97cec9649), not theirs (ea8dff9bd8c3daeb33f8c40a4e8dfe2acf6db177)
http://commons.wikimedia.org/wiki/File:Agrigento-Temple-of-Concord-flickr-1.jpg	ec985ed8bdf283f11e6b861d7fe0720d29142798	35d96c5d2b05f2ce81cd752ab862bf63dcd28964
http://commons.wikimedia.org/wiki/File:CD_F%C3%83%C2%A1tima.svg
http://commons.wikimedia.org/wiki/File:Cabo_Vil%C3%83%C2%A1n._Camari%C3%83%C2%B1as._Galiza.jpg
http://commons.wikimedia.org/wiki/File:Horchata_de_chufa.jpg
http://commons.wikimedia.org/wiki/File:Karina_Bacchi2.jpg
http://commons.wikimedia.org/wiki/File:Lid_Susa_Louvre_MAOS499.jpg
http://commons.wikimedia.org/wiki/File:NZ_Red_Admiral_%28Vanessa_gonerilla%29-4.jpg
http://commons.wikimedia.org/wiki/File:Vinbergs_kyrka.jpg

Hash of empty file:
Seems related to use hash of older file, all of them they have an older version missing
http://upload.wikimedia.org/wikipedia/commons/archive/5/5d/20090117210820%215von10.png 
http://commons.wikimedia.org/wiki/File:Alfred_de_Musset.jpg 	9e3753864bef9c18f8d75194136bfa71c440a2cd
http://commons.wikimedia.org/wiki/File:Blason-fr-en-Saint-Moreil.svg
http://commons.wikimedia.org/wiki/File:Clock_UT%2b7.png	
http://commons.wikimedia.org/wiki/File:Codex_egberti_-_egbert.jpg
http://commons.wikimedia.org/wiki/File:Cross.png
http://commons.wikimedia.org/wiki/File:Cutty_sark_October_2003.jpg
http://commons.wikimedia.org/wiki/File:DNAn+1_C.svg
http://commons.wikimedia.org/wiki/File:DNAn+1_T.svg
http://commons.wikimedia.org/wiki/File:Dalmenyhouse_lighter.jpg
http://commons.wikimedia.org/wiki/File:EtaCarinae.jpg	[but there's an intermediate version with right hash!]
http://commons.wikimedia.org/wiki/File:GeorgeWBush.jpg	[intermediate existing versions]
http://commons.wikimedia.org/wiki/File:Hallingkast.jpg
http://commons.wikimedia.org/wiki/File:Harlekin_Columbine_Tivoli_Denmark.jpg
http://commons.wikimedia.org/wiki/File:Herb_Korybut.jpg
http://commons.wikimedia.org/wiki/File:IsleofWightmap_1945.jpg
http://commons.wikimedia.org/wiki/File:Jarzabczy_Wierch_a2.jpg
http://commons.wikimedia.org/wiki/File:Kit_body_scga06.png
http://commons.wikimedia.org/wiki/File:Lilium_bulbiferum_mg-k.jpg
http://commons.wikimedia.org/wiki/File:Maisonmaton.jpg
http://commons.wikimedia.org/wiki/File:Macaronesia.jpg
http://commons.wikimedia.org/wiki/File:Monarch_caterpillar_and_egg.jpg
http://commons.wikimedia.org/wiki/File:Montelbaanstoren_01.jpg
http://commons.wikimedia.org/wiki/File:Multi-colored_Wild_Lantana_Camara_3.JPG
http://commons.wikimedia.org/wiki/File:Ponte_de_Amizade_of_Macau.JPG
http://commons.wikimedia.org/wiki/File:Rhoen_montaner_Laubwald_mg-k.jpg
http://commons.wikimedia.org/wiki/File:Rosa_omeiensis_f._pteracantha_-_Bagatelle05.jpg
http://commons.wikimedia.org/wiki/File:STS_114_day_before_launch.jpg
http://commons.wikimedia.org/wiki/File:Space_Shuttle_Enterprise_747_takeoff.ogg
http://commons.wikimedia.org/wiki/File:Starr_Miconia_calvescens0.jpg
http://commons.wikimedia.org/wiki/File:TexasFM1950.png
http://commons.wikimedia.org/wiki/File:Voiceless_bilabial_plosive.ogg
http://commons.wikimedia.org/wiki/File:Volvo480doppel.jpg
http://commons.wikimedia.org/wiki/File:Wikinews_Brief_June_13,_2005_0500_UTC.ogg
http://commons.wikimedia.org/wiki/File:William_Phips_03.jpg
http://commons.wikimedia.org/wiki/File:Wind-power-small-scale.jpg

Images where the file storing the old version in fact contain a copy of the current one
http://upload.wikimedia.org/wikipedia/commons/archive/2/2e/20080920140529!Auguste_victoria_axb02.jpg	File exists, but metadata shows us that current file is wrong (it's a copy of the smaller, current version)
http://upload.wikimedia.org/wikipedia/commons/archive/2/23/20080924123236!Banner_Porta_Westfalica.svg
http://upload.wikimedia.org/wikipedia/commons/archive/3/32/20081101222204!Beseda01.jpg
http://upload.wikimedia.org/wikipedia/commons/archive/5/59/20080111081832!Brakteat01.jpg
http://upload.wikimedia.org/wikipedia/commons/archive/8/83/20080731112730!Chirality.svg
http://upload.wikimedia.org/wikipedia/commons/archive/2/28/20071127075041!Cunningham%27s_skink444.jpg
http://upload.wikimedia.org/wikipedia/commons/archive/6/63/20080416202226!DowntownBoston.jpg
http://upload.wikimedia.org/wikipedia/commons/archive/7/79/20071126063925%21Flag_of_Italy_test.svg
http://upload.wikimedia.org/wikipedia/commons/archive/6/69/20081112225354!Florent_Gheeraert.jpg
http://upload.wikimedia.org/wikipedia/commons/archive/a/a5/20071010162601!Grabstein_Ey%C3%BCp_Bild-Giovanni_Dall%27Orto.jpg
http://upload.wikimedia.org/wikipedia/commons/archive/2/28/20080831090747!Hydro.jpg
http://upload.wikimedia.org/wikipedia/commons/archive/6/6c/20081111172337!Infirmiere_Nightingale.PNG
http://upload.wikimedia.org/wikipedia/commons/archive/3/31/20080920121617!Kecske-templom_01.jpg
http://upload.wikimedia.org/wikipedia/commons/archive/b/b1/20071124121519%21Northern_Ireland_election_seats_1997-2005-by.svg
http://upload.wikimedia.org/wikipedia/commons/archive/3/3f/20080916053014!Nuvola_Palestinian_flag.svg
http://upload.wikimedia.org/wikipedia/commons/archive/1/1f/20081111180135!PanoramaDobbiaco_b.jpg
http://upload.wikimedia.org/wikipedia/commons/archive/2/2e/20080924123129!Qcane.png
http://upload.wikimedia.org/wikipedia/commons/archive/1/12/20071124162924!Skrzypce_Adasia.JPG
http://upload.wikimedia.org/wikipedia/commons/archive/c/c6/20090116100543!Sled_dogs.jpgf
http://upload.wikimedia.org/wikipedia/commons/archive/a/af/20081102184103%21Template-question.svg
http://upload.wikimedia.org/wikipedia/commons/archive/c/cc/20071124025921!Titian-salome.jpg
http://upload.wikimedia.org/wikipedia/commons/archive/8/83/20081101223250!Trente-2.jpg
http://upload.wikimedia.org/wikipedia/commons/archive/0/0a/20080416201003!Wappen_von_Meinheim.png

http://upload.wikimedia.org/wikipedia/commons/archive/b/bf/20081203133833!Desfile_en_sello_coreano.jpg (has a copy of the now-old http://upload.wikimedia.org/wikipedia/commons/archive/b/bf/20080805193954!Desfile_en_sello_coreano.jpg)
http://upload.wikimedia.org/wikipedia/commons/archive/b/bc/20080811094312%21Escudo_de_Conil_de_la_Frontera.svg (has a copy of the now-old http://upload.wikimedia.org/wikipedia/commons/archive/b/bc/20081009101242%21Escudo_de_Conil_de_la_Frontera.svg)
http://upload.wikimedia.org/wikipedia/commons/archive/4/4f/20080731114904%21Escut_de_Vilanova_de_Segri%C3%A0.svg (has a copy of the now-old http://upload.wikimedia.org/wikipedia/commons/archive/4/4f/20080731115156%21Escut_de_Vilanova_de_Segri%C3%A0.svg)
http://upload.wikimedia.org/wikipedia/commons/archive/5/53/20080906084331!Flag_province_luxembourg.png (has a copy of... well, one of the subsequently uploaded image-warred files)
http://upload.wikimedia.org/wikipedia/commons/archive/1/19/20080305132344%21Gambia_b2.gif (has a copy of http://upload.wikimedia.org/wikipedia/commons/archive/1/19/20080305132205%21Gambia_b2.gif but the sha1 could have been calculated wrong and Ulamm have uploaded the same file 6 times instead of 5)
http://upload.wikimedia.org/wikipedia/commons/archive/9/90/20071129000654%21OSM_Pinelands_map.png (has a copy of http://upload.wikimedia.org/wikipedia/commons/archive/9/90/20080427091402%21OSM_Pinelands_map.png)
http://upload.wikimedia.org/wikipedia/commons/archive/7/71/20080416194942!V9938c_03.jpg (has a copy of http://upload.wikimedia.org/wikipedia/commons/archive/7/71/20081127155953!V9938c_03.jpg)


Uncategorized images with wrong hash (no relationship found)
File //	Real hash // Hash in db
http://upload.wikimedia.org/wikipedia/commons/archive/d/d1/20071127182430!29_Calvin_Coolidge_3x4.jpg	7c782c9610e209ad1d7451c0c860c48b7155d69e	1a5bc3a057eb7afa0698ac69d733ae013884cab5	Image is 522px × 700px (like the original) and 259.58 KB. Metadata expects 299 KB and 573x764
http://upload.wikimedia.org/wikipedia/commons/archive/0/08/20080508200822!A-26K_609SOS_near_NKP_1969.jpg	eb5d4e79ec2177c5b975d126c23f554ee03e9c01	572bb0970c82d8c838f4492c295a463f959673ee
http://upload.wikimedia.org/wikipedia/commons/archive/e/eb/20080919205313!Acylphosphate_rxn.svg	bce9e697b369ad8e75aea2c797d4d7260a3c87dc	e9ed844b62945f52896cfabfda5c191e2ad96165
http://upload.wikimedia.org/wikipedia/commons/archive/a/ad/20081112084634!Ambox_?.svg	874be09321df5f4c58dc2f644f3ec78d23146c32	98d88936056d48e53d0df15b49833de6bd03c94f
http://upload.wikimedia.org/wikipedia/commons/archive/7/70/20080416213951!BlankMap-Americas.svg	5e1d29245520f53d5e7e119322e2c27a2e6932f2	fb090aea4eccf4d805c2d96734bb6ab43946f22f
http://upload.wikimedia.org/wikipedia/commons/archive/4/45/20080528015643!Carnotaurus_DB_2.jpg	0250da19d2b0f443eebebac7df7dc0cdfbad15fc	5f0cd1d12e81a236f4667cf1ad79e51b3cd9bb0e
http://upload.wikimedia.org/wikipedia/commons/archive/4/45/20080527144757!Carnotaurus_DB_2.jpg	7cd01e305f28b213ade79c959207cf61a8ae1988	58292d741e728a4c2e91a39bb34e8ff0e86839f2 
http://upload.wikimedia.org/wikipedia/commons/archive/2/20/20081112131447!Greenereyes.JPG	e4e3defc609835267976d0ecfdeab684f8ff21f1	681ecfb07bd82e6c2d1dd154da9906b28ff1e47a
http://upload.wikimedia.org/wikipedia/commons/4/49/Hong_Kong_Science_Park_1.JPG	9c711494df2db9c7712de72414f47017a9a95da9	577b8e5008abb1885fbe3714f05aeeb66eb78559
http://upload.wikimedia.org/wikipedia/commons/archive/d/d8/20090112205355%21Old_town_zamo%C5%9B%C4%87_plan.png	b2373b572435aa7a5d80ff68efdbc460b7202af5	2e63fa192711b72ef96889c5f2fec18aef7d01d3
http://upload.wikimedia.org/wikipedia/commons/0/05/Rusta_J%C3%B6nk%C3%B6ping.jpg	6d2504f4812abdbc9710084cf416b11a8306f7fa	ca8b228aad26e078ed6e8d6a1ca200e913b18787
http://upload.wikimedia.org/wikipedia/commons/archive/9/9b/20080228002419!Tom_Savini_02.JPG	e281efce6e07541a1bbff678fa6a69ba659ff585	9f58162ec5f9480fa09b0f27204ccecf8848dc58	It's a different image than expected, metadata lists 124×164 28KB but it's 204x262 72.79KB (also not matching the other uploaded version, it's a different crop)
http://upload.wikimedia.org/wikipedia/commons/archive/8/83/20071018115142!Wednesbury_Canal_Map_SO99SE.svg	bbefc73aba02560f8fd809b7ac6dc77ec2d54cb9	a32fa8163ca5adbb8d2c77cf25f0da031aec3067	File size is 44144, metada says 46418 (but doesn't look truncated).
http://upload.wikimedia.org/wikipedia/commons/0/0a/White_Knight_Two.png	b3d712f471338f330a4fc618874232b2cd4498a1	1225ffb95b7260fc8c5afadfe0adf7af6919023e


Other:
http://upload.wikimedia.org/wikipedia/commons/archive/b/ba/20081112133435%21Maltipoo_hen%3F.jpg
Result is html saying "404 Wikimedia page not found:" but header is "HTTP/1.0 200 OK  Content-Type: image/jpeg" (should be purged)
Comment 4 Aaron Schulz 2009-07-26 20:46:36 UTC
(In reply to comment #2)
> Adding to database cleanup tracking bug 16660.
> 

Can populateSha1 be run again in the meantime?
Comment 5 Platonides 2009-07-26 22:00:47 UTC
populateSha1 wouldn't fix anything, since it only works on files which don't have hash in the db. These files do have a hash, although it's wrong.
Comment 6 MZMcBride 2009-07-26 22:06:02 UTC
Why not just make ?action=purge re-calculate the hash? Is it too expensive or something?
Comment 7 Platonides 2009-07-26 22:38:09 UTC
I don't think so. It's simply that the hash wasn't expected to be wrong.
But note that sometimes the wrong hash is on an old image version. So you 
would need to iterate all images recaulculating its hash (luckily they are 
usually few image versions).
I would prefer seeing an API module for doing purges.
Or simply seeing the sysadmins purge those entries.
Comment 8 Aaron Schulz 2009-07-27 05:41:45 UTC
(In reply to comment #5)
> populateSha1 wouldn't fix anything, since it only works on files which don't
> have hash in the db. These files do have a hash, although it's wrong.

Adding an overwrite mode sounds trivial.
Comment 9 MZMcBride 2009-07-27 08:26:32 UTC
(In reply to comment #8)
> (In reply to comment #5)
> > populateSha1 wouldn't fix anything, since it only works on files which don't
> > have hash in the db. These files do have a hash, although it's wrong.
> 
> Adding an overwrite mode sounds trivial.

Running a maintenance script every time a particular image has a hash issue is impractical. There should be a way to re-generate the hash without requiring a re-upload or command-line access. Regardless of whether the maintenance script is adjusted (which it probably should be).
Comment 10 Aaron Schulz 2009-07-27 08:32:05 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #5)
> > > populateSha1 wouldn't fix anything, since it only works on files which don't
> > > have hash in the db. These files do have a hash, although it's wrong.
> > 
> > Adding an overwrite mode sounds trivial.
> Running a maintenance script every time a particular image has a hash issue is
> impractical. There should be a way to re-generate the hash without requiring a
> re-upload or command-line access. Regardless of whether the maintenance script
> is adjusted (which it probably should be).

Obviously.

But we can use to retroactively fix the numerous wrong values, once the reason for the keys getting stuck is found. A sha-1 purge link shouldn't be needed unless something is just flat broken...users shouldn't be expected to deal with that. It could be a temporary stop-gap solution if all else fails though...
Comment 11 Chad H. 2009-08-03 23:47:18 UTC
We can now fix this for individual broken cases as of r54328. Underlying cause of why they're wrong might need fixing still?
Comment 12 Aaron Schulz 2012-02-29 22:50:23 UTC
(In reply to comment #11)
> We can now fix this for individual broken cases as of r54328. Underlying cause
> of why they're wrong might need fixing still?

More script updates in r112736.
Comment 13 Aaron Schulz 2012-03-28 17:29:34 UTC
A race condition involving a lack of locking was fixed (which previously allowed mixed up metadata for two rows).
Comment 14 Aaron Schulz 2014-04-22 05:22:46 UTC
*** Bug 49841 has been marked as a duplicate of this bug. ***
Comment 15 Aaron Schulz 2014-04-22 05:23:50 UTC
*** Bug 17070 has been marked as a duplicate of this bug. ***
Comment 16 Aaron Schulz 2014-05-14 21:25:39 UTC
A script was run to fix all of these image/oldimage rows (completed April 29).

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links