Last modified: 2013-09-28 20:46:11 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T34721, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 32721 - VIPS out of memory on large (>100 Mpx) non-progressive JPEG
VIPS out of memory on large (>100 Mpx) non-progressive JPEG
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
VipsScaler (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Jan Gerber
https://test2.wikipedia.org/wiki/Spec...
:
Depends on:
Blocks: 28135
  Show dependency treegraph
 
Reported: 2011-11-30 10:20 UTC by Lupo
Modified: 2013-09-28 20:46 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Lupo 2011-11-30 10:20:02 UTC
VIPS returns a 500: Internal server error for the large file at [[commons:File:Chicago.jpg]]:

https://test2.wikipedia.org/w/index.php?title=Special:VipsTest&thumb=Chicago.jpg&width=640&sharpen=0.8

VIPS does return an image for the "small" version of that file:

https://test2.wikipedia.org/w/index.php?title=Special:VipsTest&thumb=Chicago_small.jpg&width=640&sharpen=0.8&bilinear=1

but the quality is way below ImageMagick. Without the bilinear option, the generated thumb is much better, but still worse than ImageMagick's.
Comment 1 Tim Starling 2011-11-30 10:31:08 UTC
I see why you put "small" in quotation marks, it is 28 Mpx. 

If you use the bilinear option on an image when scaling down by a factor of 20 then it will look rubbish. I warned about this in my wikitech-l post, it is not a bug.

If you think that it's bad enough without the bilinear option to warrant filing a bug, then I suggest you file a separate bug to this out-of-memory issue, explaining what exactly is wrong with the visual result.
Comment 2 Tim Starling 2011-11-30 10:47:20 UTC
I assume it is hitting the virtual memory limit with a non-progressive JPEG, I saw similar things during my own testing. Renaming the bug accordingly.
Comment 3 Bryan Tong Minh 2011-11-30 10:48:59 UTC
It looks like increasing the sharpening to 1.2 helps quite a bit for the "small" image without bilinear scaling.
Comment 4 Tim Starling 2011-12-05 04:21:16 UTC
Hmmm, this is a pretty large error on my part. VIPS is not doing any low memory magic when reading JPEG or PNG source images.

VIPS is writing out an uncompressed copy of the source image to /tmp and presumably exceeding the file size limit. If the source image is less than 100MB, it makes the copy in memory instead, despite my previous assertions to the contrary. My measure-memory script missed the allocation because it was done in a separate thread.

We can set IM_DISC_THRESHOLD to prevent VIPS from writing temporary files. 

For JPEG images we can get VIPS to prescale the source image, like how we do it with ImageMagick. Instead of this:

vips im_shrink chicago.jpg dst.jpg 20 20

we would do this:

vips im_shrink chicago.jpg:8 dst.jpg 2.5 2.5

See <http://www.vips.ecs.soton.ac.uk/supported/current/doc/html/libvips/VipsFormat.html#im-jpeg2vips> for the details. But apparently there's not much value in using VIPS over ImageMagick for JPEGs since both are fast and ImageMagick probably looks nicer.

For TIFFs, now that I've fixed my testing procedure, I'm getting the expected difference in memory usage between tiled and untiled images. I'll put a correction on bug 23258.

For PNGs, i.e. the whole point of this project, we seem to be pretty much screwed. I haven't looked into the details of when it will or won't use memory, but it's not looking good. 

At least we can reuse most of the code you've written to integrate with pngds instead.
Comment 5 Bryan Tong Minh 2011-12-05 19:54:08 UTC
(In reply to comment #4)
> For PNGs, i.e. the whole point of this project, we seem to be pretty much
> screwed. I haven't looked into the details of when it will or won't use memory,
> but it's not looking good. 
> 
Well, that kinda sucks.

> At least we can reuse most of the code you've written to integrate with pngds
> instead.
The whole problem is that the PNG reader is WIO instead of PIO right? I'm wondering whether it is easier to write a PIO PNG reader for VIPS than to fix all the memory leaks a things like that in pngds. It was my first C project, so it is probably rather leaky and buggy.
Comment 7 Tim Starling 2011-12-06 00:15:34 UTC
(In reply to comment #5)
> The whole problem is that the PNG reader is WIO instead of PIO right? I'm
> wondering whether it is easier to write a PIO PNG reader for VIPS than to fix
> all the memory leaks a things like that in pngds. It was my first C project, so
> it is probably rather leaky and buggy.

Yes, the problem is that it's WIO rather than PIO. Have a look at read_tilewise() versus read_stripwise() in im_tiff2vips.c. In read_tilewise(), im_generate() is called, which is the main PIO function. 

The challenge for writing an im_generate() callback for PNG is that the regions could be requested any order, due to some other part of the pipeline calling im_prepare(). For example im_flipver() looks like it reads the source image from bottom to top. It probably doesn't matter for us in practice, the transforms we're interested in will probably request regions in strips from top to bottom, so a row buffer like in pngds will allow the necessary tiles or strips to be generated efficiently. But it will matter if we want to get our patch accepted upstream.

Maybe if an out-of-order region is requested, we can just make a temporary copy of the whole image and serve subsequent requests from that copy.

There's a comment in im_png2vips.c: "Noninterlaced images can be read without needing enough RAM for the whole image." Maybe I was misled by it. Both  png2vips_interlace() and png2vips_noninterlace() call im_outcheck(), which will allocate a full-size temporary image either in memory or on disk. png2vips_interlace() is allocating a second full-size buffer, so it'll require twice as much total storage space as an uncompressed image.

I think a PIO version of png2vips_noninterlace() for upstream inclusion would be a good project, and more likely to be of use to the rest of the world than pngds.
Comment 8 John Cupitt 2012-02-08 14:16:08 UTC
(this was part of an email chain, posting here for reference, I hope that's OK)

(In reply to comment #7)
> The challenge for writing an im_generate() callback for PNG is that the regions
> could be requested any order, due to some other part of the pipeline calling
> im_prepare(). For example im_flipver() looks like it reads the source image
> from bottom to top. It probably doesn't matter for us in practice, the
> transforms we're interested in will probably request regions in strips from top
> to bottom, so a row buffer like in pngds will allow the necessary tiles or
> strips to be generated efficiently. But it will matter if we want to get our
> patch accepted upstream.

I've added this to vips master, build that and you should get the new option. 

There's a new flag to the PNG loader called "sequential". Setting this option means "I promise to only request pixels top-to-bottom". This is fine for thumbnailing, but will obviously break for things like up-down flip. If you try an up-down flip despite setting :sequential you'll get an error.

With the flag set, the PNG loader pretends to hand out pixels on demand. There's a cache of (currently) 256 scan lines behind the read point to give the downstream operations a little leeway.

Benchmarks --- old behaviour:

$ time vips --vips-leak im_shrink Chicago.png tn.jpg 36 36
memory: high-water mark 12.05 MB
real	0m7.744s
user	0m5.104s
sys	0m1.104s

Memuse is low but sys is high because a large temp image is being created and deleted.

With the :seq flag enabled:

$ time vips --vips-leak im_shrink Chicago.png:seq tn.jpg 36 36
memory: high-water mark 45.54 MB
real	0m5.289s
user	0m5.424s
sys	0m0.136s

Memuse is higher (it's caching many more pixels now), but real time is lower and sys is much lower since it's no longer creating the 400 MB intermediate file.

I've also updated vipsthumbnail to automatically turn on :seq mode for all png images. This program should make downsized images as good as ImageMagick (hopefully).

$ time vipsthumbnail --vips-leak Chicago.png --size 800 -p bicubic
memory: high-water mark 34.53 MB
real	0m5.751s
user	0m5.376s
sys	0m0.176s

I'll add the :seq flag to the jpeg and tiff readers as well. It should be in the upcoming 7.28 stable release.
Comment 9 John Cupitt 2012-03-19 12:28:17 UTC
vips-7.28 is now officially out and includes sequential mode read.

In this mode, vips streams the image from the file format read library, caching just a few hundred scan lines behind the read point. Operations which do large coordinate transforms will fail, but operations which operate more-or-less sequentially (like downsizing) work fine.

$ time vipsthumbnail Chicago.png --size 800 --interpolator bicubic -o tn_Chicago-vips.jpg
real	0m5.775s
user	0m5.632s
sys	0m0.132s
peak RES: 150M

For comparison, the roughly equivalent ImageMagick command:

$ time convert Chicago.png -thumbnail 800 tn_Chicago-magick.jpg
real	0m6.223s
user	0m5.420s
sys	0m0.848s
peak RES: 950M

And the generated images:

http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-vips.jpg

http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-magick.jpg

The vips image is better quality, in my opinion, though maybe the sharpening is a bit too aggressive.
Comment 10 John Cupitt 2012-03-19 14:20:17 UTC
I've just realised that the -thumbnail option for convert is doing very crude subsampling, hence the horrible aliasing in the shrunk image. You need to use -resize to get a proper box filter: 

$ time convert Chicago.png -resize 800 -unsharp 0x.5 tn_Chicago-magick2.jpg
real	0m14.497s
user	0m20.601s
sys	0m1.068s
peak RES: 950M

Which generates this image:

http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-magick2.jpg

the same quality as the vips one, though vips is sharpening more.
Comment 11 Derrick Coetzee 2012-09-28 00:17:07 UTC
Note that if VIPS isn't feasible for PNGs for whatever reason, my pngscale project is still available and stable:

https://github.com/dcoetzee/pngscale

I've tested it on a variety of files and it appears to be both stable and fast. Its memory usage is only two scanlines.
Comment 12 John Cupitt 2012-09-28 11:51:19 UTC
I've redone all the benchmarks with the latest version of each package. Thank you for the link to your interesting project Derrick.

Summary:

program       | real    user    sys     peak mem
--------------+---------------------------------
vipsthumbnail | 5.384   5.272   0.112   25mb
pngscale      | 6.46    6.384   0.064   <1mb
convert       | 27.003  21.473  8.713   83mb

Notes:

Testing was on an elderly two-processor Opteron workstation with a PNG version of the Chicago.jpg image above, using default (6) PNG compression. See below for the test log. The machine was running Ubuntu 12.04.

vipsthumbnail is from vips-7.30.1. It has had some good improvements to the line cache since the previous benchmark above, and performance is up as well.

pngscale uses very little memory and quality is reasonable too.

The latest version of convert is now decompressing to a temporary file and processing from there, hence the large systime. This has dramatically reduced memory use.

peak mem is measured with a version of Tim Starling's perl program to watch strace output:

http://www.vips.ecs.soton.ac.uk/development/peakmem.pl

Output:

http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-vips.jpg
http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-pngscale.png
http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-magick2.jpg

Log:

I've edited this down, I ran each command four times and picked the fastest. This is just supposed to show the exact commands I ran. 

$ vips copy Chicago.jpg Chicago.png
$ time convert Chicago.png -resize 800 tn_Chicago-magick2.jpg
real    0m27.003s
user    0m21.697s
sys     0m8.713s
$ peakmem.pl convert Chicago.png -resize 800 tn_Chicago-magick2.jpg
83 MB
$ time vipsthumbnail Chicago.png --size 800 --interpolator bicubic -o tn_Chicago-vips.jpg
real    0m5.384s
user    0m5.268s
sys     0m0.112s
$ peakmem.pl vipsthumbnail Chicago.png --size 800 --interpolator bicubic -o tn_Chicago-vips.jpg
25 MB
$ time pngscale Chicago.png tn_Chicago-pngscale.png 800 -1
real    0m6.463s
user    0m6.384s
sys     0m0.064s
$ peakmem.pl pngscale Chicago.png tn_Chicago-pngscale.png 800 -1
0 MB
Comment 13 Bryan Tong Minh 2012-12-30 15:27:22 UTC
Tim, would you have time to look into this again? It looks like that with vips-7.28, vips becomes useful again at least for our PNG problem? I'd prefer to continue on the vips road, rather than write a new extension for pngscale or pngds. 

In any case, I'll need to adapt the VipsScaler for the new FileBackend stuff in 1.20.
Comment 14 John Cupitt 2013-03-09 14:18:20 UTC
A new vips is out, so I thought I'd update the benchmark again with the latest version of each package.

This time I've used my laptop. This has an i5-3210M at 2.5 GHz and 6gb of RAM. On this machine, ImageMagick processes via RAM rather than via a temporary disc file so speed is up, but memuse is too.

Summary:

program       version  | real    user    sys     peak mem
-----------------------+---------------------------------
vipsthumbnail 7.32.0   | 3.925   3.856   0.068   54mb
pngscale      master   | 4.530   4.484   0.040   <1mb
convert       6.7.7-10 | 8.516   17.521  0.784   987mb

peak mem is measured with a version of Tim Starling's perl program to watch strace output:

http://www.vips.ecs.soton.ac.uk/development/peakmem.pl

Output:

http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-vips.jpg
http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-pngscale.png
http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-magick2.jpg

Quality looks very similar, though perhaps pngscale is a little softer than the others. 

Log:

I've edited this down, I ran each command four times and picked the fastest. This is just supposed to show the exact commands I ran. 

$ vips copy Chicago.jpg Chicago.png
$ time vipsthumbnail Chicago.png --size 800 --interpolator bicubic -o tn_Chicago-vips.jpg
real	0m3.931s
user	0m3.900s
sys	0m0.032s
$ peakmem.pl vipsthumbnail Chicago.png --size 800 --interpolator bicubic -o tn_Chicago-vips.jpg
54 MB  
$ time pngscale Chicago.png tn_Chicago-pngscale.png 800 -1
real	0m4.530s
user	0m4.484s
sys	0m0.040s
$ peakmem.pl time pngscale Chicago.png tn_Chicago-pngscale.png 800 -1
0 MB  
$ time convert Chicago.png -resize 800 tn_Chicago-magick2.jpg
real	0m8.516s
user	0m17.521s
sys	0m0.784s
$ peakmem.pl convert Chicago.png -resize 800 tn_Chicago-magick2.jpg
987 MB
Comment 15 Rob Lanphier 2013-04-03 15:20:07 UTC
Jan is going to look into this, and take on getting the correct version of Vips deployed to fix the memory usage issue
Comment 16 John Cupitt 2013-04-03 17:19:46 UTC
Good news, please don't hesitate to mail me if there's any way I can help.

Bryan's excellent extension needs revising to use "vipsthumbnail". As it stands VipsScalar will not be able to make use of the new vips streaming features.

I could do this, if that would be helpful. I'd need pointing to a guide on how to set up a mediawiki test environment to work in.
Comment 17 John Cupitt 2013-04-03 17:25:35 UTC
A less intrusive change would be to append ":seq" to the PNG filenames in Bryan's VipsScalar extension. This will turn on the streaming mode and get memuse and disc use down. 

It would be better to switch to using vipsthumbnail, but just adding ":seq" would be enough to solve the immediate problem.

Sorry, I should have got this all in to a single post.
Comment 18 Jan Gerber 2013-04-08 07:40:38 UTC
@John, looking at this right now, and I have some issues with vips-7.28.5:

vipsthumbnail insists on writing the output in the folder of the input: 

 vipsthumbnail /srv/wiki/images/6/66/Chicago.png -o /tmp/transform_2ba5f5461b96-1.png --size 120

fails with:
vips__file_open_write: unable to open file "/srv/wiki/images/6/66/tmp/transform_2ba5f5461b96-1.png" for writing


appending :seq to png images also fails:

vips im_shrink /srv/wiki/images/6/66/Chicago_test.png:seq /tmp/transform_2ba5f5461b96-1.png 2.435546305931321456e+2 2.450218978102189737e+2
vips_image_get: field "icc-profile-data" not found
VipsSequential: non-sequential read --- at position 544 in file, but position 3808 requested
VipsSequential: non-sequential read --- at position 1088 in file, but position 0 requested
vips2png: unable to write "/tmp/transform_2ba5f5461b96-1.png"
vips_sequential_generate: error -- at position 544 in file, but position 3808 requestedvips_sequential_generate: error -- at position 1088 in file,
Comment 19 John Cupitt 2013-04-08 09:17:52 UTC
@Jan, there was a bug in vipsthumbnail, it only allowed relative paths in -o. This was fixed in 7.32, the current stable version. Could you try that?

I'll look into the im_shrink() error.
Comment 20 John Cupitt 2013-04-08 16:55:42 UTC
Here's the issue on the vipsthumbnail absolute pathname problem, for reference. It includes a patch, though of course it'd be better to update to current stable.

https://github.com/jcupitt/libvips/issues/42

There were two problems with the im_shrink command. First, I was wrong about appending :seq to filenames, the correct fix is to switch to the newer syntax, eg.:

$ vips shrink huge.png x.png 230 230 

ie. drop the "im_" prefix and don't use :seq. Secondly, there was a vips bug which broke sequential mode at the command-line for this one operation. I've fixed it and put up a new stable tarball:

http://www.vips.ecs.soton.ac.uk/supported/7.32/vips-7.32.2.tar.gz

I've also added a thing to the vips test suite to check correct functioning of vipsthumbnail and vips shrink. Hopefully they will not break again. 

https://github.com/jcupitt/nip2/commit/e3100446d0dd67d60d8222f31d13099c30672662
Comment 21 Gerrit Notification Bot 2013-04-11 03:52:25 UTC
https://gerrit.wikimedia.org/r/57478 (Gerrit Change I8ab3375c709ddb495eaf0e0cc5d90d6f5fe7110b) | change APPROVED and MERGED [by Tim Starling]
Comment 22 Andre Klapper 2013-04-15 12:19:12 UTC
Patch in https://gerrit.wikimedia.org/r/#/c/57478/ to bump from 100MB to 400MB got merged. Anything left here to do, or can this ticket be closed as FIXED?
Comment 23 John Cupitt 2013-04-16 11:21:46 UTC
(In reply to comment #22)
> Anything left here to do, or can this ticket be closed as FIXED?

I'm the vips maintainer. As it stands, the VipsScalar plugin will cause a lot of disc traffic. Should fixing that happen in a new bug? If yes, then this can be closed. 

VipsScalar currently works by running a series of separate vips commands. Looking at the source here:

https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/VipsScaler.git;a=blob;f=VipsScaler_body.php;h=69f31b108d99849c36fd92591f135bfbfca69d02;hb=HEAD

makeCommands() will typically run something like:

# unpack png file to a huge disc temp
vips im_png2vips huge.png t1.v
# block shrink by large integer factor
vips im_shrink t1.v t2.v xx xx
# bilinear resize to final exact dimensions 
vips im_resize_linear t2.v t3.v xxxx yyyy
# sharpen slightly to counteract softening effect of bilinear
vips im_convf t3.v t4.v sharpen_matrix
# any 90-degree rotation required
vips im_rotxx t4.v final.jpg

Instead, it should simply run vipsthumbnail:

vipsthumbnail huge.png -o final.jpg --size required-output-size-in-pixels --interpolator bicubic

This requires the latest stable version of vips which is not yet in Debian and must be built from source.
Comment 24 Jan Gerber 2013-04-17 08:08:53 UTC
created https://bugzilla.wikimedia.org/show_bug.cgi?id=47311 to track use of vipsthumbnail (and update to 7.32.2)
Comment 25 Jan Gerber 2013-05-22 10:53:27 UTC
new version of vips installed on cluster and urls like 
https://test2.wikipedia.org/w/index.php?title=Special:VipsTest&thumb=Chicago.jpg&width=640&sharpen=0.8
work now
Comment 26 Bawolff (Brian Wolff) 2013-09-28 20:46:11 UTC
(In reply to comment #22)
> Patch in https://gerrit.wikimedia.org/r/#/c/57478/ to bump from 100MB to
> 400MB
> got merged. Anything left here to do, or can this ticket be closed as FIXED?

I'm confused. Limit on Wikimedia was already 400 MB, so this should have changed nothing...

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links