Last modified: 2013-09-28 20:46:11 UTC
VIPS returns a 500: Internal server error for the large file at [[commons:File:Chicago.jpg]]: https://test2.wikipedia.org/w/index.php?title=Special:VipsTest&thumb=Chicago.jpg&width=640&sharpen=0.8 VIPS does return an image for the "small" version of that file: https://test2.wikipedia.org/w/index.php?title=Special:VipsTest&thumb=Chicago_small.jpg&width=640&sharpen=0.8&bilinear=1 but the quality is way below ImageMagick. Without the bilinear option, the generated thumb is much better, but still worse than ImageMagick's.
I see why you put "small" in quotation marks, it is 28 Mpx. If you use the bilinear option on an image when scaling down by a factor of 20 then it will look rubbish. I warned about this in my wikitech-l post, it is not a bug. If you think that it's bad enough without the bilinear option to warrant filing a bug, then I suggest you file a separate bug to this out-of-memory issue, explaining what exactly is wrong with the visual result.
I assume it is hitting the virtual memory limit with a non-progressive JPEG, I saw similar things during my own testing. Renaming the bug accordingly.
It looks like increasing the sharpening to 1.2 helps quite a bit for the "small" image without bilinear scaling.
Hmmm, this is a pretty large error on my part. VIPS is not doing any low memory magic when reading JPEG or PNG source images. VIPS is writing out an uncompressed copy of the source image to /tmp and presumably exceeding the file size limit. If the source image is less than 100MB, it makes the copy in memory instead, despite my previous assertions to the contrary. My measure-memory script missed the allocation because it was done in a separate thread. We can set IM_DISC_THRESHOLD to prevent VIPS from writing temporary files. For JPEG images we can get VIPS to prescale the source image, like how we do it with ImageMagick. Instead of this: vips im_shrink chicago.jpg dst.jpg 20 20 we would do this: vips im_shrink chicago.jpg:8 dst.jpg 2.5 2.5 See <http://www.vips.ecs.soton.ac.uk/supported/current/doc/html/libvips/VipsFormat.html#im-jpeg2vips> for the details. But apparently there's not much value in using VIPS over ImageMagick for JPEGs since both are fast and ImageMagick probably looks nicer. For TIFFs, now that I've fixed my testing procedure, I'm getting the expected difference in memory usage between tiled and untiled images. I'll put a correction on bug 23258. For PNGs, i.e. the whole point of this project, we seem to be pretty much screwed. I haven't looked into the details of when it will or won't use memory, but it's not looking good. At least we can reuse most of the code you've written to integrate with pngds instead.
(In reply to comment #4) > For PNGs, i.e. the whole point of this project, we seem to be pretty much > screwed. I haven't looked into the details of when it will or won't use memory, > but it's not looking good. > Well, that kinda sucks. > At least we can reuse most of the code you've written to integrate with pngds > instead. The whole problem is that the PNG reader is WIO instead of PIO right? I'm wondering whether it is easier to write a PIO PNG reader for VIPS than to fix all the memory leaks a things like that in pngds. It was my first C project, so it is probably rather leaky and buggy.
https://github.com/jcupitt/libvips/blob/master/libvips/format/im_png2vips.c
(In reply to comment #5) > The whole problem is that the PNG reader is WIO instead of PIO right? I'm > wondering whether it is easier to write a PIO PNG reader for VIPS than to fix > all the memory leaks a things like that in pngds. It was my first C project, so > it is probably rather leaky and buggy. Yes, the problem is that it's WIO rather than PIO. Have a look at read_tilewise() versus read_stripwise() in im_tiff2vips.c. In read_tilewise(), im_generate() is called, which is the main PIO function. The challenge for writing an im_generate() callback for PNG is that the regions could be requested any order, due to some other part of the pipeline calling im_prepare(). For example im_flipver() looks like it reads the source image from bottom to top. It probably doesn't matter for us in practice, the transforms we're interested in will probably request regions in strips from top to bottom, so a row buffer like in pngds will allow the necessary tiles or strips to be generated efficiently. But it will matter if we want to get our patch accepted upstream. Maybe if an out-of-order region is requested, we can just make a temporary copy of the whole image and serve subsequent requests from that copy. There's a comment in im_png2vips.c: "Noninterlaced images can be read without needing enough RAM for the whole image." Maybe I was misled by it. Both png2vips_interlace() and png2vips_noninterlace() call im_outcheck(), which will allocate a full-size temporary image either in memory or on disk. png2vips_interlace() is allocating a second full-size buffer, so it'll require twice as much total storage space as an uncompressed image. I think a PIO version of png2vips_noninterlace() for upstream inclusion would be a good project, and more likely to be of use to the rest of the world than pngds.
(this was part of an email chain, posting here for reference, I hope that's OK) (In reply to comment #7) > The challenge for writing an im_generate() callback for PNG is that the regions > could be requested any order, due to some other part of the pipeline calling > im_prepare(). For example im_flipver() looks like it reads the source image > from bottom to top. It probably doesn't matter for us in practice, the > transforms we're interested in will probably request regions in strips from top > to bottom, so a row buffer like in pngds will allow the necessary tiles or > strips to be generated efficiently. But it will matter if we want to get our > patch accepted upstream. I've added this to vips master, build that and you should get the new option. There's a new flag to the PNG loader called "sequential". Setting this option means "I promise to only request pixels top-to-bottom". This is fine for thumbnailing, but will obviously break for things like up-down flip. If you try an up-down flip despite setting :sequential you'll get an error. With the flag set, the PNG loader pretends to hand out pixels on demand. There's a cache of (currently) 256 scan lines behind the read point to give the downstream operations a little leeway. Benchmarks --- old behaviour: $ time vips --vips-leak im_shrink Chicago.png tn.jpg 36 36 memory: high-water mark 12.05 MB real 0m7.744s user 0m5.104s sys 0m1.104s Memuse is low but sys is high because a large temp image is being created and deleted. With the :seq flag enabled: $ time vips --vips-leak im_shrink Chicago.png:seq tn.jpg 36 36 memory: high-water mark 45.54 MB real 0m5.289s user 0m5.424s sys 0m0.136s Memuse is higher (it's caching many more pixels now), but real time is lower and sys is much lower since it's no longer creating the 400 MB intermediate file. I've also updated vipsthumbnail to automatically turn on :seq mode for all png images. This program should make downsized images as good as ImageMagick (hopefully). $ time vipsthumbnail --vips-leak Chicago.png --size 800 -p bicubic memory: high-water mark 34.53 MB real 0m5.751s user 0m5.376s sys 0m0.176s I'll add the :seq flag to the jpeg and tiff readers as well. It should be in the upcoming 7.28 stable release.
vips-7.28 is now officially out and includes sequential mode read. In this mode, vips streams the image from the file format read library, caching just a few hundred scan lines behind the read point. Operations which do large coordinate transforms will fail, but operations which operate more-or-less sequentially (like downsizing) work fine. $ time vipsthumbnail Chicago.png --size 800 --interpolator bicubic -o tn_Chicago-vips.jpg real 0m5.775s user 0m5.632s sys 0m0.132s peak RES: 150M For comparison, the roughly equivalent ImageMagick command: $ time convert Chicago.png -thumbnail 800 tn_Chicago-magick.jpg real 0m6.223s user 0m5.420s sys 0m0.848s peak RES: 950M And the generated images: http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-vips.jpg http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-magick.jpg The vips image is better quality, in my opinion, though maybe the sharpening is a bit too aggressive.
I've just realised that the -thumbnail option for convert is doing very crude subsampling, hence the horrible aliasing in the shrunk image. You need to use -resize to get a proper box filter: $ time convert Chicago.png -resize 800 -unsharp 0x.5 tn_Chicago-magick2.jpg real 0m14.497s user 0m20.601s sys 0m1.068s peak RES: 950M Which generates this image: http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-magick2.jpg the same quality as the vips one, though vips is sharpening more.
Note that if VIPS isn't feasible for PNGs for whatever reason, my pngscale project is still available and stable: https://github.com/dcoetzee/pngscale I've tested it on a variety of files and it appears to be both stable and fast. Its memory usage is only two scanlines.
I've redone all the benchmarks with the latest version of each package. Thank you for the link to your interesting project Derrick. Summary: program | real user sys peak mem --------------+--------------------------------- vipsthumbnail | 5.384 5.272 0.112 25mb pngscale | 6.46 6.384 0.064 <1mb convert | 27.003 21.473 8.713 83mb Notes: Testing was on an elderly two-processor Opteron workstation with a PNG version of the Chicago.jpg image above, using default (6) PNG compression. See below for the test log. The machine was running Ubuntu 12.04. vipsthumbnail is from vips-7.30.1. It has had some good improvements to the line cache since the previous benchmark above, and performance is up as well. pngscale uses very little memory and quality is reasonable too. The latest version of convert is now decompressing to a temporary file and processing from there, hence the large systime. This has dramatically reduced memory use. peak mem is measured with a version of Tim Starling's perl program to watch strace output: http://www.vips.ecs.soton.ac.uk/development/peakmem.pl Output: http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-vips.jpg http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-pngscale.png http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-magick2.jpg Log: I've edited this down, I ran each command four times and picked the fastest. This is just supposed to show the exact commands I ran. $ vips copy Chicago.jpg Chicago.png $ time convert Chicago.png -resize 800 tn_Chicago-magick2.jpg real 0m27.003s user 0m21.697s sys 0m8.713s $ peakmem.pl convert Chicago.png -resize 800 tn_Chicago-magick2.jpg 83 MB $ time vipsthumbnail Chicago.png --size 800 --interpolator bicubic -o tn_Chicago-vips.jpg real 0m5.384s user 0m5.268s sys 0m0.112s $ peakmem.pl vipsthumbnail Chicago.png --size 800 --interpolator bicubic -o tn_Chicago-vips.jpg 25 MB $ time pngscale Chicago.png tn_Chicago-pngscale.png 800 -1 real 0m6.463s user 0m6.384s sys 0m0.064s $ peakmem.pl pngscale Chicago.png tn_Chicago-pngscale.png 800 -1 0 MB
Tim, would you have time to look into this again? It looks like that with vips-7.28, vips becomes useful again at least for our PNG problem? I'd prefer to continue on the vips road, rather than write a new extension for pngscale or pngds. In any case, I'll need to adapt the VipsScaler for the new FileBackend stuff in 1.20.
A new vips is out, so I thought I'd update the benchmark again with the latest version of each package. This time I've used my laptop. This has an i5-3210M at 2.5 GHz and 6gb of RAM. On this machine, ImageMagick processes via RAM rather than via a temporary disc file so speed is up, but memuse is too. Summary: program version | real user sys peak mem -----------------------+--------------------------------- vipsthumbnail 7.32.0 | 3.925 3.856 0.068 54mb pngscale master | 4.530 4.484 0.040 <1mb convert 6.7.7-10 | 8.516 17.521 0.784 987mb peak mem is measured with a version of Tim Starling's perl program to watch strace output: http://www.vips.ecs.soton.ac.uk/development/peakmem.pl Output: http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-vips.jpg http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-pngscale.png http://www.vips.ecs.soton.ac.uk/development/tn_Chicago-magick2.jpg Quality looks very similar, though perhaps pngscale is a little softer than the others. Log: I've edited this down, I ran each command four times and picked the fastest. This is just supposed to show the exact commands I ran. $ vips copy Chicago.jpg Chicago.png $ time vipsthumbnail Chicago.png --size 800 --interpolator bicubic -o tn_Chicago-vips.jpg real 0m3.931s user 0m3.900s sys 0m0.032s $ peakmem.pl vipsthumbnail Chicago.png --size 800 --interpolator bicubic -o tn_Chicago-vips.jpg 54 MB $ time pngscale Chicago.png tn_Chicago-pngscale.png 800 -1 real 0m4.530s user 0m4.484s sys 0m0.040s $ peakmem.pl time pngscale Chicago.png tn_Chicago-pngscale.png 800 -1 0 MB $ time convert Chicago.png -resize 800 tn_Chicago-magick2.jpg real 0m8.516s user 0m17.521s sys 0m0.784s $ peakmem.pl convert Chicago.png -resize 800 tn_Chicago-magick2.jpg 987 MB
Jan is going to look into this, and take on getting the correct version of Vips deployed to fix the memory usage issue
Good news, please don't hesitate to mail me if there's any way I can help. Bryan's excellent extension needs revising to use "vipsthumbnail". As it stands VipsScalar will not be able to make use of the new vips streaming features. I could do this, if that would be helpful. I'd need pointing to a guide on how to set up a mediawiki test environment to work in.
A less intrusive change would be to append ":seq" to the PNG filenames in Bryan's VipsScalar extension. This will turn on the streaming mode and get memuse and disc use down. It would be better to switch to using vipsthumbnail, but just adding ":seq" would be enough to solve the immediate problem. Sorry, I should have got this all in to a single post.
@John, looking at this right now, and I have some issues with vips-7.28.5: vipsthumbnail insists on writing the output in the folder of the input: vipsthumbnail /srv/wiki/images/6/66/Chicago.png -o /tmp/transform_2ba5f5461b96-1.png --size 120 fails with: vips__file_open_write: unable to open file "/srv/wiki/images/6/66/tmp/transform_2ba5f5461b96-1.png" for writing appending :seq to png images also fails: vips im_shrink /srv/wiki/images/6/66/Chicago_test.png:seq /tmp/transform_2ba5f5461b96-1.png 2.435546305931321456e+2 2.450218978102189737e+2 vips_image_get: field "icc-profile-data" not found VipsSequential: non-sequential read --- at position 544 in file, but position 3808 requested VipsSequential: non-sequential read --- at position 1088 in file, but position 0 requested vips2png: unable to write "/tmp/transform_2ba5f5461b96-1.png" vips_sequential_generate: error -- at position 544 in file, but position 3808 requestedvips_sequential_generate: error -- at position 1088 in file,
@Jan, there was a bug in vipsthumbnail, it only allowed relative paths in -o. This was fixed in 7.32, the current stable version. Could you try that? I'll look into the im_shrink() error.
Here's the issue on the vipsthumbnail absolute pathname problem, for reference. It includes a patch, though of course it'd be better to update to current stable. https://github.com/jcupitt/libvips/issues/42 There were two problems with the im_shrink command. First, I was wrong about appending :seq to filenames, the correct fix is to switch to the newer syntax, eg.: $ vips shrink huge.png x.png 230 230 ie. drop the "im_" prefix and don't use :seq. Secondly, there was a vips bug which broke sequential mode at the command-line for this one operation. I've fixed it and put up a new stable tarball: http://www.vips.ecs.soton.ac.uk/supported/7.32/vips-7.32.2.tar.gz I've also added a thing to the vips test suite to check correct functioning of vipsthumbnail and vips shrink. Hopefully they will not break again. https://github.com/jcupitt/nip2/commit/e3100446d0dd67d60d8222f31d13099c30672662
https://gerrit.wikimedia.org/r/57478 (Gerrit Change I8ab3375c709ddb495eaf0e0cc5d90d6f5fe7110b) | change APPROVED and MERGED [by Tim Starling]
Patch in https://gerrit.wikimedia.org/r/#/c/57478/ to bump from 100MB to 400MB got merged. Anything left here to do, or can this ticket be closed as FIXED?
(In reply to comment #22) > Anything left here to do, or can this ticket be closed as FIXED? I'm the vips maintainer. As it stands, the VipsScalar plugin will cause a lot of disc traffic. Should fixing that happen in a new bug? If yes, then this can be closed. VipsScalar currently works by running a series of separate vips commands. Looking at the source here: https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/VipsScaler.git;a=blob;f=VipsScaler_body.php;h=69f31b108d99849c36fd92591f135bfbfca69d02;hb=HEAD makeCommands() will typically run something like: # unpack png file to a huge disc temp vips im_png2vips huge.png t1.v # block shrink by large integer factor vips im_shrink t1.v t2.v xx xx # bilinear resize to final exact dimensions vips im_resize_linear t2.v t3.v xxxx yyyy # sharpen slightly to counteract softening effect of bilinear vips im_convf t3.v t4.v sharpen_matrix # any 90-degree rotation required vips im_rotxx t4.v final.jpg Instead, it should simply run vipsthumbnail: vipsthumbnail huge.png -o final.jpg --size required-output-size-in-pixels --interpolator bicubic This requires the latest stable version of vips which is not yet in Debian and must be built from source.
created https://bugzilla.wikimedia.org/show_bug.cgi?id=47311 to track use of vipsthumbnail (and update to 7.32.2)
new version of vips installed on cluster and urls like https://test2.wikipedia.org/w/index.php?title=Special:VipsTest&thumb=Chicago.jpg&width=640&sharpen=0.8 work now
(In reply to comment #22) > Patch in https://gerrit.wikimedia.org/r/#/c/57478/ to bump from 100MB to > 400MB > got merged. Anything left here to do, or can this ticket be closed as FIXED? I'm confused. Limit on Wikimedia was already 400 MB, so this should have changed nothing...