Last modified: 2013-12-09 18:30:24 UTC
[Suggestion from community.] Not sure if we'd want to do this?
We are building a highly marked-up HTML DOM (see for example http://www.mediawiki.org/wiki/Parsoid/RDFa_vocabulary and http://www.mediawiki.org/wiki/Parsoid/HTML5_DOM_with_microdata, or http://parsoid.wmflabs.org for live output), which should be relatively easy to convert to LaTeX with existing tools. If additional information in the DOM is needed, then please let us know here! Also changing the title to Parsoid since I imagine this to be more about the actual conversion than a button to start it.
This would be really great!
Just a clarification: We don't currently plan to work on this ourselves, but would be happy to support somebody taking this on. A quick test using the pandoc tool (http://johnmacfarlane.net/pandoc/, apt-get install pandoc) looks quite promising: pandoc -s -r html http://parsoid.wmflabs.org/en:Foo -o Foo.tex The output could likely be improved by making use of the extra information contained in the Parsoid DOM, for example by adding a separate HTML flavor to pandoc. If Haskell is not your favorite language, there seem to be other html -> latex converters around, including at least one in Java: https://www.google.de/search?num=100&hl=en&ie=UTF-8&oe=UTF-8&q=convert%20html%20latex
Mass-moving bugs into the new 'Parsoid' product.
Closing as wontfix, as we don't plan to tackle this as part of the Parsoid project.
I actually solved this problem. I used the same basic technologies as pandoc. The debian package is available as mediawiki2latex in debian sid. The command line tools takes an URL to a wiki page and writes a pdf file generated with latex. The latex source tree including processed images can also be exported. Yours Dirk Hünniger
(In reply to comment #6) > I actually solved this problem. I used the same basic technologies as pandoc. > The debian package is available as mediawiki2latex in debian sid. The command > line tools takes an URL to a wiki page and writes a pdf file generated with > latex. The latex source tree including processed images can also be exported. > Yours Dirk Hünniger So is this based on HTML or wikitext?
It your choice. The standard mode is html. But if you provide the -m command line option it is based on wikitext.
(In reply to comment #8) > It your choice. The standard mode is html. But if you provide the -m command > line option it is based on wikitext. The HTML mode sounds great. Would it be hard to take advantage of the extra metadata the Parsoid HTML5+RDFa [1] offers? [1]: http://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec
Yes you can do that. You just just need to find someone to implement it since I lack time at the moment and will likely do so in the foreseeable future. The second point is that I cannot see any advantage I could get from using this data because from what I can see at a glance, everything useful is already in the normal html and I am already using all that.
Actually C. Scott Ananian is currently working on this Bug. Maybe you should update the status and assignees of this bug. C. Scott Ananian is in particular a developer of the parsoid project. https://git.wikimedia.org/summary/mediawiki%2Fextensions%2FCollection%2FOfflineContentGenerator%2Flatex_renderer
@Dirk: This bug is about implementing this in Parsoid, which we don't plan to do. The latex renderer in the collection extension leverages Parsoid HTML, but is a separate project. This makes it useful independently of Parsoid, for example once we switch to HTML storage.