Last modified: 2014-11-20 00:23:19 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T69525, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 67525 - Generate thumbnails based on buckets
Generate thumbnails based on buckets
Status: PATCH_TO_REVIEW
Product: MediaWiki
Classification: Unclassified
Uploading (Other open bugs)
unspecified
All All
: High enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: performance
Depends on:
Blocks: 65217
  Show dependency treegraph
 
Reported: 2014-07-04 08:51 UTC by Gilles Dubuc
Modified: 2014-11-20 00:23 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
file that scaled badly (unscaled original) (227.72 KB, image/png)
2014-08-31 20:42 UTC, Bawolff (Brian Wolff)
Details
My test screenshot file, 800px normal (no bucket) scaling with image magick (222.65 KB, image/png)
2014-09-18 17:49 UTC, Bawolff (Brian Wolff)
Details
800px chained scaling using image magick (207.44 KB, image/png)
2014-09-18 17:54 UTC, Bawolff (Brian Wolff)
Details
800px chained, scaling using vips (144.82 KB, image/png)
2014-09-18 17:56 UTC, Bawolff (Brian Wolff)
Details

Description Gilles Dubuc 2014-07-04 08:51:40 UTC
The idea is to offer an option that will allow to generate thumbnails based on a chain. I.e. a given thumbnail would be generated based on a bigger thumbnail rather than on the original whenever possible. This should greatly increase performance for large files, and if good bucket values are picked, the visual impact should be unnoticeable.

I verified that the visual impact would be minimal with a power-of-2 progression with chaining up to 5 thumbnails by running an informal survey which I invited developers and commons users to participate in.
Comment 1 Gerrit Notification Bot 2014-07-04 14:34:23 UTC
Change 135008 had a related patch set uploaded by Gilles:
Generate thumbnails based on buckets

https://gerrit.wikimedia.org/r/135008
Comment 2 Gerrit Notification Bot 2014-07-09 14:09:06 UTC
Change 135008 merged by jenkins-bot:
Generate thumbnails based on buckets

https://gerrit.wikimedia.org/r/135008
Comment 3 Gerrit Notification Bot 2014-07-09 21:26:36 UTC
Change 145132 had a related patch set uploaded by Gergő Tisza:
Add thumbnail buckets for beta sites

https://gerrit.wikimedia.org/r/145132
Comment 4 Gerrit Notification Bot 2014-07-10 18:19:55 UTC
Change 145132 merged by jenkins-bot:
Use reference thumbnails for JPEG/PNG thumbnailing on beta sites

https://gerrit.wikimedia.org/r/145132
Comment 5 Tisza Gergő 2014-07-18 23:39:59 UTC
Doesn't seem to be working on beta.

Steps taken to verify:

1. open http://upload.beta.wmflabs.org/wikipedia/en/thumb/4/4d/Snowman.JPG/1000px-Snowman.JPG in browser
2. ssh (via the labs bastion) to deployment-upload
3. ls ls /data/project/upload7/wikipedia/en/thumb/4/4d/Snowman.JPG/

Expected result: 1000px-Snowman.JPG and 2048px-Snowman.JPG should be present
Actual result: only 1000px-Snowman.JPG is present

Same for deployment-cache-upload02.

I took the server names from https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/Overview which is very outdated, and guessed the directory path from apache and puppet config files, so I might have gotten something wrong. However, the requested image size does appear, only the bucket sizes are missing, and those should be in the same directory, so it seems something is not quite working there.
Comment 6 Tisza Gergő 2014-07-19 00:39:28 UTC
deployment-upload has a thumb script which looks like it has been forked off thumb.php several years ago, and it forwards to deployment-cache-text02, so maybe that is the box which acutally acts as a scaler. The X-Wikimedia-Thumb header on the generated image also points there. Still, /data/project/ is a network share, it should not matter where I am looking at it.
Comment 7 Bawolff (Brian Wolff) 2014-07-19 05:52:22 UTC
> 
> I verified that the visual impact would be minimal with a power-of-2
> progression with chaining up to 5 thumbnails by running an informal survey
> which I invited developers and commons users to participate in.

What sort of pictures were on the survey? What settings were used, etc?

 I just tested the change (using the settings currently at beta cluster). On the first image I tried (a screenshot, which might not be representative of the average content of a png file due to the large amount of small text, but it was something already on my test wiki) the quality was noticeably less (although possibly still in the acceptable range) with bucketing (and things got even worse if one used bucketting + vips).

------

(In reply to Tisza Gergő from comment #5)
> Doesn't seem to be working on beta.

Another way to test:

Run http://upload.beta.wmflabs.org/wikipedia/labs/thumb/8/8b/Bn.beta.wmflabs.org.PNG/310px-Bn.beta.wmflabs.org.PNG through exiftool, you get:

 [..]
 Thumb Imageheight               : 1024
 Thumb Image Width               : 1280
 Thumb URI                       : file:///data/project/upload7/wikipedia/labs/8/8b/Bn.beta.wmflabs.org.PNG
 [..]

Which would be different if bucketing was working
Comment 8 Gilles Dubuc 2014-07-21 11:16:29 UTC
> What sort of pictures were on the survey? What settings were used, etc?

One or two control images that had no specific qualities as well as several images with a lot of edges: https://www.surveymonkey.com/s/F6CGPDJ The reason I picked images with a lot of edges is that they're generally the images that gather the most complaints when we tweak thumbnailing.

The same logic and parameters as the patch were applied, with ImageMagick as the scaler. Each image shown in the survey had been through 3 to 5 chaining steps. There's no denying that there is a proven quality loss on a technical level, just by virtue of resampling, but the survey results were clear about the fact that on average the chained ones were slightly preferred. Presumably because of the extra sharpening (most chaining steps meet the criteria of the sharpening check in the code). The old code sharpens once from the original, the new code may sharpen once for each chain step.

The reason why thumbs are sharpened in the first place - a common practice on large websites - is that people find sharpened thumbnails to look better even when mathematically speaking they aren't (on the contrary, more original content is getting lost). It's probably because the conserved edges help with the way we recognize shapes and detail. I.e. our brain will have an easier time compensating for the loss of detail if edges are stronger, even if artificially conserving the edges actually makes more original detail disappear.

And so, with chaining the edges are conserved a bit better, which is probably why they're favored in the survey results. That's the only theory I have about the counter-intuitive results. When I ran it, I was expecting to see that the chained ones would be disfavored, in which case it would have been a balancing act between what people tolerate visually and server resources.

That being said, maybe if we cranked up the sharpening value on the default code, people would prefer that to the chained thumbnails. I didn't try to run the survey another time to find out. But the goal here is to save server resources while not upsetting people, not to improve the thumbnail quality/popularity. It seems like a reasonable balance to me to launch a recipe that people don't noticeably dislike on average compared to the status quo.

I don't see this change as a big risk, anyway, because if a vocal minority campaigns against it, it's easy to revert and purge the images. The B plan is to perform the same chaining but to store each bucket size as a lossless format. We'd have the same speed gains, but the vastly increased storage needs means that we can't do that while we're still storing thumbnails in Swift.
Comment 9 Tisza Gergő 2014-07-21 20:32:39 UTC
I think the surveys only used JPEG images, and Brian said he tested with a PNG, so maybe this is another counterintuitive result where the perceived quality loss is greater for PNG images (which have more sharp details than JPEGs)?
Comment 10 Gilles Dubuc 2014-07-22 12:27:26 UTC
Ah yes, I'll look into PNG closer. If PNG resizing doesn't do any sharpening, that might be the explanation.
Comment 11 Gilles Dubuc 2014-07-24 15:25:57 UTC
Gergo, I don't have access to deployment-upload, could you give it another try, now that the fixed beta config is out?
Comment 12 Tisza Gergő 2014-07-24 18:25:55 UTC
Works as expected.
Comment 13 Gilles Dubuc 2014-07-28 15:53:23 UTC
Brian, I just wanted to check if the quality issues you encountered were due to not having defined the minimum distance, or if you have repro steps with sample images that I could use?
Comment 14 Bawolff (Brian Wolff) 2014-08-31 20:42:10 UTC
Created attachment 16331 [details]
file that scaled badly (unscaled original)

I used:

$wgThumbnailBuckets = array( 256, 512, 1024, 2048, 4096 );
$wgThumbnailMinimumBucketDistance => 32;

As my setting (And tried both with, and without VIPS enabled. Results were much worse with VIPS, but they were bad with image magick too).

The file I was testing (original, unscaled) is attached
Comment 16 Gilles Dubuc 2014-09-01 07:48:57 UTC
Could you give me the $wgVips* configuration settings you were using? This way I can generate the same set of thumbnails with VIPS.
Comment 17 Bawolff (Brian Wolff) 2014-09-18 17:20:31 UTC
For testing VIPS I was using:

$wgVipsOptions = array(
        array(
                'conditions' => array(
                        'mimeType' => 'image/png',
                )
);

(The main difference from production is that production has a minsize parameter.)
Comment 18 Bawolff (Brian Wolff) 2014-09-18 17:49:44 UTC
Created attachment 16513 [details]
My test screenshot file, 800px normal (no bucket) scaling with image magick

800px output on normal scaling (bucketing disabled, using image magick). Looks very nice.
Comment 19 Bawolff (Brian Wolff) 2014-09-18 17:54:02 UTC
Created attachment 16514 [details]
800px chained scaling using image magick

Using chaining (Note, only 1 intermediate thumbnail. Original is 1280px, then it makes an intermediate of 1024px, and then does target of 800px. Possibly bigger different if several buckets involved).

Image is more "fuzzy", and small text in image is harder to read. Definitely noticeable if doing side by side comparison. However, quality may still be acceptable.
Comment 20 Bawolff (Brian Wolff) 2014-09-18 17:56:54 UTC
Created attachment 16515 [details]
800px chained, scaling using vips

Using chained with vips (1280->1024->800).

Note on production, VIPS is not used for small images, so in practice this might not be as much of an issue since VIPS would only be used on the very biggest bucket size. Maybe. Possibly needs more experimentation to see.

Text in image is significantly harder to read, and quality of image is noticeably less
Comment 21 Gilles Dubuc 2014-09-19 07:31:30 UTC
That's very odd, the chaining-free one I had was equally fuzzy as the chained one, which is why I didn't notice the difference. I'll try to figure out what happened there. This might be related to the sharpening the code does with IM. As for VIPS, if it's not doing any sharpening, that would explain it.
Comment 22 Gilles Dubuc 2014-09-23 15:06:15 UTC
I can finally repro and I know why it's happening: currently mediawiki doesn't do any sharpening for PNGs. It does for JPGs, and the extra sharpening passes in the chaining compensate the quality loss in terms of perceived quality. I had been testing mostly with JPGs during the development of this feature, which is why I missed the fact that PNGs are different in regards to sharpening.

In order to preserve a decent amount of perceived quality, PNGs need to be sharpened when chained. I'll experiment locally and come up with a follow-up patch, but since the impact on perceived quality had only been tested on JPGs, I think that releasing this to production should be restrained to JPGs at first. I'll treat the release for PNG separately and I'll run another user test for perceived quality on them.
Comment 23 Gerrit Notification Bot 2014-09-23 15:09:51 UTC
Change 162279 had a related patch set uploaded by Gilles:
Disable thumbnail chaining support for PNGs

https://gerrit.wikimedia.org/r/162279
Comment 24 Gerrit Notification Bot 2014-09-23 15:33:09 UTC
Change 162279 merged by jenkins-bot:
Disable thumbnail chaining support for PNGs

https://gerrit.wikimedia.org/r/162279
Comment 25 Gerrit Notification Bot 2014-11-03 18:22:32 UTC
Change 170747 had a related patch set uploaded by Gilles:
Enable JPG thumbnail chaining on beta

https://gerrit.wikimedia.org/r/170747
Comment 26 Gerrit Notification Bot 2014-11-04 13:22:30 UTC
Change 170747 merged by jenkins-bot:
Enable JPG thumbnail chaining on beta

https://gerrit.wikimedia.org/r/170747
Comment 27 Gerrit Notification Bot 2014-11-10 15:27:34 UTC
Change 172254 had a related patch set uploaded by Gilles:
Enable JPG thumbnail chaining on all wikis except commons

https://gerrit.wikimedia.org/r/172254
Comment 28 Gerrit Notification Bot 2014-11-10 16:02:39 UTC
Change 172254 merged by jenkins-bot:
Enable JPG thumbnail chaining on all wikis except commons

https://gerrit.wikimedia.org/r/172254
Comment 29 Nemo 2014-11-13 10:12:31 UTC
Why is a revert proposed now?
https://gerrit.wikimedia.org/r/#/c/172960/
Comment 30 Gerrit Notification Bot 2014-11-13 10:26:46 UTC
Change 172969 had a related patch set uploaded by Gilles:
Don't re-apply EXIF rotation to chained thumbnails

https://gerrit.wikimedia.org/r/172969
Comment 31 Gilles Dubuc 2014-11-13 10:29:18 UTC
Nemo: because of https://bugzilla.wikimedia.org/show_bug.cgi?id=73352 which https://gerrit.wikimedia.org/r/172969 aims to fix.

Not sure how long the review process will take for that one, so it's best to not generate thumbnails with the wrong orientations in production in the meantime.
Comment 32 Gerrit Notification Bot 2014-11-19 02:06:04 UTC
Change 172969 merged by jenkins-bot:
Don't re-apply EXIF rotation to chained thumbnails

https://gerrit.wikimedia.org/r/172969
Comment 33 Gerrit Notification Bot 2014-11-19 17:56:45 UTC
Change 174453 had a related patch set uploaded by Gilles:
Don't re-apply EXIF rotation to chained thumbnails

https://gerrit.wikimedia.org/r/174453
Comment 34 Gerrit Notification Bot 2014-11-20 00:23:19 UTC
Change 174453 merged by jenkins-bot:
Don't re-apply EXIF rotation to chained thumbnails

https://gerrit.wikimedia.org/r/174453

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links