Last modified: 2013-12-09 18:30:24 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T39933, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 37933 - Parsoid: Export as LaTeX
Parsoid: Export as LaTeX
Status: RESOLVED WONTFIX
Product: Parsoid
Classification: Unclassified
DOM (Other open bugs)
unspecified
All All
: Lowest enhancement
: ---
Assigned To: Gabriel Wicke
:
Depends on: 46516
Blocks: 27574 46517
  Show dependency treegraph
 
Reported: 2012-06-25 17:20 UTC by James Forrester
Modified: 2013-12-09 18:30 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description James Forrester 2012-06-25 17:20:08 UTC
[Suggestion from community.]

Not sure if we'd want to do this?
Comment 1 Gabriel Wicke 2012-06-25 20:44:45 UTC
We are building a highly marked-up HTML DOM (see for example http://www.mediawiki.org/wiki/Parsoid/RDFa_vocabulary and http://www.mediawiki.org/wiki/Parsoid/HTML5_DOM_with_microdata, or http://parsoid.wmflabs.org for live output), which should be relatively easy to convert to LaTeX with existing tools. If additional information in the DOM is needed, then please let us know here!

Also changing the title to Parsoid since I imagine this to be more about the actual conversion than a button to start it.
Comment 2 Helder 2012-06-27 23:07:57 UTC
This would be really great!
Comment 3 Gabriel Wicke 2012-06-28 07:52:28 UTC
Just a clarification: We don't currently plan to work on this ourselves, but would be happy to support somebody taking this on. A quick test using the pandoc tool (http://johnmacfarlane.net/pandoc/, apt-get install pandoc) looks quite promising:

pandoc -s -r html http://parsoid.wmflabs.org/en:Foo -o Foo.tex

The output could likely be improved by making use of the extra information contained in the Parsoid DOM, for example by adding a separate HTML flavor to pandoc. If Haskell is not your favorite language, there seem to be other html -> latex converters around, including at least one in Java: https://www.google.de/search?num=100&hl=en&ie=UTF-8&oe=UTF-8&q=convert%20html%20latex
Comment 4 James Forrester 2012-08-06 19:26:02 UTC
Mass-moving bugs into the new 'Parsoid' product.
Comment 5 Gabriel Wicke 2013-03-26 18:45:23 UTC
Closing as wontfix, as we don't plan to tackle this as part of the Parsoid project.
Comment 6 Dirk Hünniger 2013-08-23 19:36:53 UTC
I actually solved this problem. I used the same basic technologies as pandoc. The debian package is available as mediawiki2latex in debian sid. The command line tools takes an URL to a wiki page and writes a pdf file generated with latex. The latex source tree including processed images can also be exported.
Yours Dirk Hünniger
Comment 7 Gabriel Wicke 2013-08-23 21:48:06 UTC
(In reply to comment #6)
> I actually solved this problem. I used the same basic technologies as pandoc.
> The debian package is available as mediawiki2latex in debian sid. The command
> line tools takes an URL to a wiki page and writes a pdf file generated with
> latex. The latex source tree including processed images can also be exported.
> Yours Dirk Hünniger

So is this based on HTML or wikitext?
Comment 8 Dirk Hünniger 2013-08-24 04:43:28 UTC
It your choice. The standard mode is html. But if you provide the -m command line option it is based on wikitext.
Comment 9 Gabriel Wicke 2013-08-26 20:25:41 UTC
(In reply to comment #8)
> It your choice. The standard mode is html. But if you provide the -m command
> line option it is based on wikitext.

The HTML mode sounds great. Would it be hard to take advantage of the extra metadata the Parsoid HTML5+RDFa [1] offers?

[1]: http://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec
Comment 10 Dirk Hünniger 2013-08-27 04:59:50 UTC
Yes you can do that. You just just need to find someone to implement it since I lack time at the moment and will likely do so in the foreseeable future. The second point is that I cannot see any advantage I could get from using this data because from what I can see at a glance, everything useful is already in the normal html and I am already using all that.
Comment 11 Dirk Hünniger 2013-12-08 20:50:43 UTC
Actually C. Scott Ananian is currently working on this Bug. Maybe you should update the status and assignees of this bug. C. Scott Ananian is in particular a developer of the parsoid project.

https://git.wikimedia.org/summary/mediawiki%2Fextensions%2FCollection%2FOfflineContentGenerator%2Flatex_renderer
Comment 12 Gabriel Wicke 2013-12-09 18:30:24 UTC
@Dirk: This bug is about implementing this in Parsoid, which we don't plan to do.

The latex renderer in the collection extension leverages Parsoid HTML, but is a separate project. This makes it useful independently of Parsoid, for example once we switch to HTML storage.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links