Last modified: 2014-10-28 16:36:46 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T70576, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 68576 - Images embedded in PDF have an excessive resolution
Images embedded in PDF have an excessive resolution
Status: VERIFIED DUPLICATE of bug 72377
Product: MediaWiki extensions
Classification: Unclassified
Collection (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: C. Scott Ananian
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-07-25 15:46 UTC by Nemo
Modified: 2014-10-28 16:36 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Chess oversized PDF (552.95 KB, application/octet-stream)
2014-08-14 07:33 UTC, Nemo
Details
Comet (2.10 MB, application/octet-stream)
2014-08-14 07:37 UTC, Nemo
Details

Description Nemo 2014-07-25 15:46:03 UTC
I see http://www.imagemagick.org/script/command-line-options.php#density is currently set at 600 dpi. https://gerrit.wikimedia.org/r/#/c/138149/7/lib/index.js,cm

This seems excessive, makes it very easy to reach tens or hundreds MB in PDF from few big pages. For a book, 300 dpi is generally plenty; scans for instance rarely go over 400 dpi.
Comment 1 C. Scott Ananian 2014-07-25 15:51:24 UTC
That -density argument doesn't do what you think it does.  It just edits the EXIF metadata, it doesn't affect the file contents at all.

It's necessary because some commons JPEGs have silly DPI settings (like '1') which cause xelatex to try to compute absurdly large image sizes.

There is a separate `size` argument for mw-ocg-bundler, when the file contents get fetched.  It currently defaults to 1200px.
Comment 2 Nemo 2014-08-14 07:33:38 UTC
Created attachment 16190 [details]
Chess oversized PDF

(In reply to C. Scott Ananian from comment #1)
> which cause xelatex to try to compute absurdly large image sizes.

And what does xelatex do with this dpi info?

I tried https://en.wikipedia.org/w/index.php?title=Special:Book&bookcmd=render_article&arttitle=First-move+advantage+in+chess&oldid=619135240&writer=rdf2latex from bug 68929 and it produced an unnecessarily big PDF, 500 KB for 20 pages and 3 photos. pdfimges -j extracts a 750x1000px image from the last one: if it's actually that big, it must be reduced.
Comment 3 Nemo 2014-08-14 07:37:30 UTC
Created attachment 16191 [details]
Comet

Also [[Comet]] is too big, at 2 MB. Is the first image really 5.6 megapixels?
Comment 4 Nemo 2014-08-21 09:52:29 UTC
Reopening (otherwise it's hard to find) with a more generic summary (please change as appropriate).
Example: 16 MiB for [[de:Corps]], https://de.wikipedia.org/w/index.php?title=Spezial:Buch&bookcmd=render_article&arttitle=Corps&oldid=60280924&writer=rdf2latex
Comment 5 C. Scott Ananian 2014-09-12 18:29:25 UTC
Is a 1200px x 1200px maximum resolution excessive?  What should this be reduced to?

(As a separate issue -- in bug 68836 we are apparently not correctly passing options from the front end to the renderer backend; perhaps we're also ignoring the option set by the OCG frontend here as well?)
Comment 6 Nemo 2014-09-12 18:39:12 UTC
(In reply to C. Scott Ananian from comment #5)
> Is a 1200px x 1200px maximum resolution excessive?  What should this be
> reduced to?

If you can't control the actual dpi, perhaps even some 250x250 would be ok.
Comment 7 Erik Moeller 2014-09-12 21:58:48 UTC
We should aim for print appropriate resolution, since that's one benefit of obtaining a PDF. Is there no way to obtain the resolution closest to 300 PPI for the output format we're targeting?
Comment 8 C. Scott Ananian 2014-09-23 17:33:49 UTC
1200px x 1200px is 300dpi for a 4" wide image (single column).  That is "print-appropriate" -- but many of our articles happen to have a large number of extremely high quality images.  I think what we actually need to do is render by default at a lower resolution and "opt-in" to full resolution images by exposing a dpi setting in the book creator.
Comment 9 Gerrit Notification Bot 2014-09-23 17:34:26 UTC
Change 162296 had a related patch set uploaded by Cscott:
Reduce default image resolution to 150dpi.

https://gerrit.wikimedia.org/r/162296
Comment 10 Gerrit Notification Bot 2014-09-23 18:41:19 UTC
Change 162296 merged by jenkins-bot:
Reduce default image resolution to 150dpi.

https://gerrit.wikimedia.org/r/162296
Comment 11 C. Scott Ananian 2014-09-25 15:09:12 UTC
With the patch in comment 10, [[:de:Corps]] is 11.4MB (down from 16MB in comment 4) but [[First-move advantage in chess]] is still 560k (same as comment 2) with the same 750px x 1000px image.  That should have been reduced to 600px by the patch in comment 10, so something is still not quite right here.
Comment 12 C. Scott Ananian 2014-09-25 15:26:28 UTC
Correction, [[First-move advantage in chess]] is down to 384k and the image is 600x800 now.  I was looking at an older cached version.

The total size of the JPG images in [[First-move advantage in chess] is 104k+72k+36k = 212k.  So that's 172k of "other stuff", presumably mostly fonts.  That doesn't seem unreasonable.  As Nemo notes above, since we scale oversize images we're already doing better than mwlib used to.
Comment 14 C. Scott Ananian 2014-10-06 14:55:15 UTC
There's a GC bug, fixed by a patch I'm deploying today, so the ganglia info might not be strictly accurate.

Nevertheless, our PDFs are chunky.  I haven't seen any information that they are *more* chunky than the old mwlib images, however -- if anything, I believe them to be rather *smaller*.  It's just a consequence of embedding print-quality images.
Comment 15 Christoph Kepper 2014-10-08 19:40:23 UTC
If I remember correctly, mwlib fetches images at a standard size (max 1200px wide). From our experience larger images did not lead to a noticable increase in output quality. Download PDFs might have used even smaller images.
Comment 16 C. Scott Ananian 2014-10-28 16:31:12 UTC
I believe this was fixed as part of bug 72377.

*** This bug has been marked as a duplicate of bug 72377 ***
Comment 17 Nemo 2014-10-28 16:36:46 UTC
(In reply to C. Scott Ananian from comment #16)
> I believe this was fixed as part of bug 72377.

Probably!

(In reply to Nemo from comment #4)
> Example: 16 MiB for [[de:Corps]],
> https://de.wikipedia.org/w/index.php?title=Spezial:
> Buch&bookcmd=render_article&arttitle=Corps&oldid=60280924&writer=rdf2latex

It's now 1.8 MiB. :)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links