Last modified: 2014-09-23 19:50:06 UTC
We've reports on Wikimedia Commons thumbnails can't be generated because images are corrupt. * * * [ Analysis ] Well, the problem is these images seems to really be corrupted. e.g. [[Commons:File:Augustinusbishop.gif]] /home/dereckson ] fetch https://upload.wikimedia.org/wikipedia/commons/7/7b/Augustinusbishop.gif Augustinusbishop.gif 100% of 346 kB 889 kBps /home/dereckson ] mogrify Augustinusbishop.gif -resize 200x300 mogrify: corrupt image `Augustinusbishop.gif' @ error/gif.c/ReadGIFImage/1348. * * * [ A heavy to maintain solution ] We should have a script verifying periodically our pictures and reporting corrupted images detected. PIL can detect such images with the verify method. Here a sample script (works for any other format supported by PIL too): https://bitbucket.org/denilsonsa/small_scripts/src/tip/jpeg_corrupt.py The infrastructure we need should be optimized to detect at least 100 000 pictures per day (= 1.15 image per seconds), so if it runs continuously we can have every picture verified every 150 days. * * * It should be evaluated if this is needed or if human reporting would work better. We also have to involve Wikimedia Commons community to manually fix the corrupted pictures.
Note a clever solution could be to intercept the error when the thumbnail generation issue occur and generate a list of such files.
CC'ing Aaron. Aaron, could you take a look at this maintenance script and provide some feedback? Do you think that this could be incorporated in the long run?