Last modified: 2014-07-30 01:22:47 UTC
the collection extension should generate PDFs from mediawiki's HTML output instead of using a custom parser and relying on mediawiki's broken expand templates feature. That would fix all bugs related to parsing mediawiki markup and expanding templates.
I recommended this a few years ago, but we went with the second parser solution as it was already in development. Using a headless WebKit browser to generate PDFs is fairly straightforward, but I'm not sure how best to handle combining multiple articles together etc. This wouldn't be a trivial project, but would be nice to investigate.
Parsoid HTML with RDFa might be a better starting point for print-specific customization as it contains a lot of semantic information. See http://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec. Another potentially relevant project would be http://bookjs.net/, a JS library that prepares HTML content for printing in WebKit. It did not work in my testing and seems to be pretty cutting-edge, but there are probably ways to make it work as it is used by the booktype project.
Update re book.js: The demo at http://bookjs.net/data/body.html does not work for me, but the demos in the git checkout work both in Chromium 28 and Chrome 29: git clone https://github.com/sourcefabric/BookJS.git
Is this fixed by the new PDF renderer? * Re-implementing PDF support http://lists.wikimedia.org/pipermail/wikitech-l/2013-November/073059.html * Status update on new Collections PDF Renderer http://lists.wikimedia.org/pipermail/wikitech-l/2013-November/073238.html
(In reply to comment #4) > Is this fixed by the new PDF renderer? I'd say yes. I'll leave the pleasure of closing this bug to the PDF team though ;)
The new PDF renderer isn't deployed yet; maybe we should wait until then?
With the 'public' release of the OCG renderer; I'm going to close this bug.