Last modified: 2014-10-27 00:25:37 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T19645, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 17645 - OOM while thumbnailing huge progressive / interlaced JPEGs
OOM while thumbnailing huge progressive / interlaced JPEGs
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
File management (Other open bugs)
unspecified
All All
: Lowest normal (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 24228 36733 37367 (view as bug list)
Depends on:
Blocks: 40663 41371
  Show dependency treegraph
 
Reported: 2009-02-24 07:02 UTC by Adam Cuerden
Modified: 2014-10-27 00:25 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
List of Commons non-baseline images above 5 MB (55 bytes, text/plain)
2012-10-23 11:10 UTC, Nemo
Details
List of Commons non-baseline images above 5 MB (1.27 MB, text/plain)
2012-10-23 11:12 UTC, Nemo
Details
List of 559678 Commons non-baseline images below 5 MB (4.33 MB, application/x-7z-compressed)
2012-12-12 23:22 UTC, Nemo
Details

Description Adam Cuerden 2009-02-24 07:02:46 UTC
Several images have suddenly decided to simply refuse to display, but if you download (though, oddly, NOT if you simply click on the "Full resolution" link to view it, at least in Firefox), they work fine.

Examples:

http://commons.wikimedia.org/wiki/File:Suikoden.jpg http://commons.wikimedia.org/wiki/File:Somagahana_Fuchiemon_restored.jpg http://commons.wikimedia.org/wiki/File:Somagahana_Fuchiemon.jpg


It's been pointed out that there are interesting error messages:

http://upload.wikimedia.org/wikipedia/commons/thumb/d/d5/Suikoden.jpg/411px-Suikoden.jpg

gives:

'''Error generating thumbnail'''

Error creating thumbnail: convert: Insufficient memory (case 4) `/mnt/upload5/wikipedia/commons/d/d5/Suikoden.jpg'.

convert: missing an image filename `/mnt/upload5/wikipedia/commons/thumb/d/d5/Suikoden.jpg/411px-Suikoden.jpg'.


Think you can fix it? ~~~~
Comment 1 Durova 2009-02-24 19:19:46 UTC
This is a problem that's inhibiting access to featured content. ~~~~
Comment 2 Tim Starling 2009-02-25 03:21:28 UTC
Do not use interlaced (a.k.a. progressive) JPEG compression. This option greatly increases the amount of memory required for decompression, and thus reduces performance both for the server and for clients such as browsers. All three cited test cases use this compression mode.

I have uploaded one of the three files with interlacing removed:

<http://commons.wikimedia.org/wiki/File:Suikoden_(no_interlace).jpg>

As you can see, it works just fine. You can do this with ImageMagick using:

convert Source.jpg -interlace none Destination.jpg

Omitting the -interlace, i.e. a null convert, also appears to work.
Comment 3 Nemo 2010-08-24 09:33:56 UTC
More examples from #wikimedia-tech: [[File:Panorama_-_Ch%C3%A2teau_des_ducs_de_Bourbon_%C3%A0_Montlu%C3%A7on_depuis_l%27esplanade.JPG]], [[File:1966_map_of_the_Appalachian_Development_Highway_System.jpg]].
Isn't there a list of interlaced images? They could be replaced with non-interlaced versions by some bot.
Comment 4 Nemo 2012-09-15 17:50:58 UTC
*** Bug 36733 has been marked as a duplicate of this bug. ***
Comment 5 Jarek Tuszynski 2012-10-17 14:51:32 UTC
Would it be possible to change the interlace automatically during the upload? I run into this problem quite a few time since it looks like some version of GIMP save everything in the interlace mode by default.
Comment 6 Marco 2012-10-17 15:40:54 UTC
bug #24228 can be fixed as a dupl. of this one?!
Comment 7 Nemo 2012-10-17 17:22:31 UTC
*** Bug 24228 has been marked as a duplicate of this bug. ***
Comment 8 Nemo 2012-10-17 17:22:37 UTC
*** Bug 37367 has been marked as a duplicate of this bug. ***
Comment 9 Nemo 2012-10-17 17:49:55 UTC
Tim, which of the JPEG SOF tags identify a non-interlaced image (good for us)? http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/JPEG.html#SOF

0x0 = Baseline DCT, Huffman coding
0x1 = Extended sequential DCT, Huffman coding
0x2 = Progressive DCT, Huffman coding
0x3 = Lossless, Huffman coding
0x5 = Sequential DCT, differential Huffman coding
0x6 = Progressive DCT, differential Huffman coding
0x7 = Lossless, Differential Huffman coding
0x9 = Extended sequential DCT, arithmetic coding
0xa = Progressive DCT, arithmetic coding
0xb = Lossless, arithmetic coding
0xd = Sequential DCT, differential arithmetic coding
0xe = Progressive DCT, differential arithmetic coding
0xf = Lossless, differential arithmetic coding

('exiftool -fast2' is a couple orders of magnitude faster than 'identify -verbose'.)
Comment 10 esby 2012-10-18 08:06:00 UTC
@Nemo_bis:
Quoting Tim : Do not use interlaced (a.k.a. ***progressive***)
Comment 11 Nemo 2012-10-18 08:14:58 UTC
(In reply to comment #10)
> @Nemo_bis:
> Quoting Tim : Do not use interlaced (a.k.a. ***progressive***)

Sorry, I don't see how this answers my question. Do you mean that all sequential, lossless etc. encodings there are ok (and why)?
Comment 12 Marco 2012-10-18 11:22:57 UTC
I dont think lossless is ok. I stumbled upon some lossless jpegs lately which could not be read with any program. (Sry, but I cant remember the SOF tag)
Comment 13 Rainer Rillke @commons.wikimedia 2012-10-18 17:18:35 UTC
IMHO, either a server software supports rendering huge progressive JPEGs or it refuses them while uploading or it converts them directly after uploading.

With Upload Wizard and some modern browsers you can even try to detect those file at the client side before uploading. VirusTotal is e.g. computing a hash at the client before they upload the file in order to save server capacity. So it should be possible to read JPEG file headers.

Progressive JPEGs aren't created by digital cameras. Thus, their origin is in imaging-software. It is often just unchecking a check box. But the user has to know this. Current behaviour is NOT OK.

I would be inclined reopening this bug.
Comment 14 Nemo 2012-10-23 11:10:11 UTC
Created attachment 11219 [details]
List of Commons non-baseline images above 5 MB

Here's the first list I made with exiftool (27884 images above 5 MB).
Comment 15 Nemo 2012-10-23 11:12:20 UTC
Created attachment 11220 [details]
List of Commons non-baseline images above 5 MB

Better as explicit attachment for archiving.
Comment 16 Nemo 2012-12-12 23:22:40 UTC
Created attachment 11500 [details]
List of 559678 Commons non-baseline images below 5 MB
Comment 17 Rainer Rillke @commons.wikimedia 2013-01-20 18:13:20 UTC
someone at commons is now converting everything:
https://commons.wikimedia.org/wiki/Commons:Bots/Work_requests#Convert_all_interlaced_JPGs

This can't be desired behaviour, come on, wake up.
Comment 18 esby 2013-01-24 14:36:02 UTC
That seems indeed a bit much to convert files that are technically perfectly fine (as the thumbnail is properly generated)..
Comment 19 trlkly 2014-01-10 14:10:09 UTC
Since when does a single programmer get to set policy for the entirety of Wikimedia?

There is a bug here. Even if progressive images take up more memory, the fact that the system is not waiting and allocating the correct amount of memory is a bug.

Progressive JPEGs are going to be uploaded whether you want them to be or not. Most images on Wikipedia are uploaded from the web, and most web JPEGs are progressive, as progressive JPEGs make smaller files.

In fact, I personally have no intent to stop using progressive JPEGs since I've been using them since 2007 without incident. Lots of things editors do puts a large memory load on the server. We aren't required to try to make it easier on the system. 

I've been using progressive JPEGs on Wikimedia for years, and I've not run into a problem. If I do, then maybe I'll convert, but not until then. I'm not going to condone a programmer changing policy in order to avoid fixing a bug.

And don't say you haven't changed policy. You put a demand on all Wikimedia users that they do a certain thing a certain way, even though the other way works. That's a policy change. It's even listed at the Commons Help:JPEG.
Comment 20 trlkly 2014-01-10 14:38:02 UTC
Left out something: If there is definitely a bug, you have two choices. You can try to fix it, or you leave it open so that someone else can fix it. You do not close a legitimate bug by telling people that they are required to work around it. 

And this is a legitimate bug, as there's no way the servers were coincidentally that close to capacity every time the bug reporter tried to generate the thumbnail. Either enough memory is not being allocated or there's a bug requiring a lot more memory for this file than for other progressive JPEGs which work just fine.
Comment 21 Andre Klapper 2014-01-10 14:54:06 UTC
trlkly: No idea which "policy" thing you talk about, but maintainers of a codebase are free to decide that they are actively against fixing a valid bug in the software if this would create side effects ("reduces performance both for the server and for clients such as browsers") that they considered worse.
Comment 22 trlkly 2014-01-10 15:41:11 UTC
Nope. Not in open source. (In reply to comment #21)
> trlkly: No idea which "policy" thing you talk about, but maintainers of a
> codebase are free to decide that they are actively against fixing a valid bug
> in the software if this would create side effects ("reduces performance both
> for the server and for clients such as browsers") that they considered worse.

They are allowed to refuse patches if they think the patches have downsides, yes, but not to arbitrarily declare that all such patches must have that downside. And note the word "they" rather than "he." This was a single person making the decision, without even entertaining the idea that someone might have a way to handle it.

And, in fact, there are multiple ways of getting around the issues he stated. There's no inherent reason that progressive JPEGs take longer to render than baseline JPEGs. It isn't the case on any modern software. It isn't the case that they must take up a lot more memory as, unlike thumbnailing, converting between the two can be done without full decompression. Thus the memory requirements are as low as you can stand having to go back to the disk to read more of the file.

Furthermore, thumbnailing a progressive JPEG often requires less of the JPEG to be rendered, since you only have to render up to the resolution just above the thumbnail. Progressive JPEGs essentially have their own thumbnails baked in.

There are multiple solutions that could deal with this problem without causing significant drain on the system. Most of them came in after the guy arbitrarily closed the bug without waiting for ideas on how to mitigate the problems. 

A bug should be left open if it is legitimate. Closing the bug prevents anyone else from coming up with a solution that mitigates all problems.
Comment 23 Nemo 2014-01-10 15:48:13 UTC
(In reply to comment #22)
> They are allowed to refuse patches if they think the patches have downsides,
> yes, but not to arbitrarily declare that all such patches must have that
> downside. [...]
> 
> And, in fact, there are multiple ways of getting around the issues he stated.
> There's no inherent reason that progressive JPEGs take longer to render than
> baseline JPEGs. It isn't the case on any modern software.

Have you brought this up with ImageMagick, then? You could also submit a patch to them, as you mention that.
(Note, there's also VIPS but I don't think we ever use it for JPEG. https://blog.wikimedia.org/2013/09/12/vipsscaler-implementation-wikimedia-sites/ )
Comment 24 Tim Starling 2014-01-11 02:19:00 UTC
(In reply to comment #19)
> Since when does a single programmer get to set policy for the entirety of
> Wikimedia?

Since before it was called Wikimedia. That's not to say it's a good decision-making system. I'm happy to hear other opinions or for others to submit patches in this area. 

> Progressive JPEGs are going to be uploaded whether you want them to be or
> not.

It's not ideal to have bots convert them. I would prefer it if they were rejected on upload.

> And don't say you haven't changed policy. You put a demand on all Wikimedia
> users that they do a certain thing a certain way, even though the other way
> works. That's a policy change. It's even listed at the Commons Help:JPEG.

Sure, changing policy is a hack, in the absence of a feature which would reject these files on upload.

If they were rejected on upload, then we could set a threshold based on available server memory, instead of having bot authors guess at what that threshold should be.

(In reply to comment #22)
> Furthermore, thumbnailing a progressive JPEG often requires less of the JPEG
> to be rendered, since you only have to render up to the resolution just above
> the thumbnail. Progressive JPEGs essentially have their own thumbnails 
> baked in.

Maybe if the browsers or the image scaling software we use took advantage of this, then you would have a point. But as it stands, it's not really a good subject for a bug against MediaWiki. It would be a good subject for a bug against ImageMagick.

> There are multiple solutions that could deal with this problem without
> causing significant drain on the system. Most of them came in after
> the guy arbitrarily closed the bug without waiting for ideas on how 
> to mitigate the problems. 

Everyone should feel free to submit ideas about bugs that are closed "WONTFIX". 

> A bug should be left open if it is legitimate. 

I think WONTFIX was an appropriate way to describe the situation.

> Closing the bug prevents
> anyone else from coming up with a solution that mitigates all problems.

By what mechanism? It's not like we're preventing comments on the bug, or telling upstream projects like libvips or ImageMagick to reject your patches.
Comment 25 Rainer Rillke @commons.wikimedia 2014-10-27 00:25:37 UTC
(In reply to Tim Starling from comment #24)
> It's not ideal to have bots convert them. I would prefer it if they were 
> rejected on upload.

From the usability point of view, that's horrible. I am happy when users understand what JPEG and PNG is at all. Coming from Facebook, they call everything a "Pic" and when you reject progressive JPEGs with a message like: "Progressive JPEGs must not be uploaded here, instead use baseline because it's better for our servers", I am sure we will succeed in confusing 90% of the new uploaders receiving this message.

BTW, do we still use ImageMagic for JPEGs or VIPS?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links