Last modified: 2014-09-23 19:50:06 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T43380, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 41380 - Corrupt images should be detected and reported - by humans or automatic script.
Corrupt images should be detected and reported - by humans or automatic script.
Status: NEW
Product: Wikimedia
Classification: Unclassified
Media storage (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 41371
  Show dependency treegraph
 
Reported: 2012-10-25 09:59 UTC by Dereckson
Modified: 2014-09-23 19:50 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Dereckson 2012-10-25 09:59:24 UTC
We've reports on Wikimedia Commons thumbnails can't be generated because images are corrupt.

* * *

[ Analysis ]

Well, the problem is these images seems to really be corrupted.

e.g. [[Commons:File:Augustinusbishop.gif]]

/home/dereckson ] fetch https://upload.wikimedia.org/wikipedia/commons/7/7b/Augustinusbishop.gif
Augustinusbishop.gif                          100% of  346 kB  889 kBps
/home/dereckson ] mogrify Augustinusbishop.gif -resize 200x300
mogrify: corrupt image `Augustinusbishop.gif' @ error/gif.c/ReadGIFImage/1348.

* * *

[ A heavy to maintain solution ]

We should have a script verifying periodically our pictures and reporting corrupted images detected.

PIL can detect such images with the verify method.

Here a sample script (works for any other format supported by PIL too):
https://bitbucket.org/denilsonsa/small_scripts/src/tip/jpeg_corrupt.py

The infrastructure we need should be optimized to detect at least 100 000 pictures per day (= 1.15 image per seconds), so if it runs continuously we can have every picture verified every 150 days.

* * *

It should be evaluated if this is needed or if human reporting would work better.

We also have to involve Wikimedia Commons community to manually fix the corrupted pictures.
Comment 1 Dereckson 2012-10-25 10:01:12 UTC
Note a clever solution could be to intercept the error when the thumbnail generation issue occur and generate a list of such files.
Comment 2 Andre Klapper 2012-10-25 21:03:06 UTC
CC'ing Aaron.
Aaron, could you take a look at this maintenance script and provide some feedback? Do you think that this could be incorporated in the long run?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links