Last modified: 2010-04-17 15:32:09 UTC
Occasionally, mediawiki places html tags inside of latex statements in text-only render mode, as examplified on http://en.wikipedia.org/wiki/Riemann_hypothesis. This prohibits automatic parsing of latex statements, and creates a weird formatting both when copy&pasting and in page display ($ sign indented on first line, latex statement on next line unindented). I would very much appreciate if mediawiki could be updated in this regard. Thank you in advance. As a side question related to automated parsing, would it be possible to place latex statements inside of html span tags with class="tex" descriptors, equivalently to what is done for img tags in PNG rendering?
Can you be more specific about where the problem can be found ?
The only "html" i find there is like <img class="tex" alt="\log g(n) < \sqrt{\operatorname{Li}^{-1}(n)}" src="http://upload.wikimedia.org/math/5/e/d/5edf1bc3778b2456213d7857b1f82f80.png" /> The content of that alternate text is '\log g(n) < \sqrt{\operatorname{Li}^{-1}(n)}', showing some characters as entities is needed on html. You should unentity any text you extract directly. Any html/xml parser will do it for you. I have no problem copying and pasting that, btw.
I was referring to the text-only render mode, i.e., if you go to preferences and select option "Leave it as TeX (for text browsers)" in wikipedia. You will see that the Riemann zeta function in the first section is written like this in html: <dd>$</dd> </dl> <p>\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s} = \frac{1}{1^s} + \frac{1}{2^s} + \frac{1}{3^s} + \cdots. \! $
issue confirmed. The math tag in question contains a newline at the start. In Math.php this is output as: return ('$ '.htmlspecialchars( $this->tex ).' $'); So the indentation, $ + newline + tex. + our weird code to parse indentation and lists creates this output. Related to bug 22818, which is the same for a different mode.
*** Bug 22818 has been marked as a duplicate of this bug. ***
Fixed in r65039.
Awesome, thanks! I'm looking forward to that update on wikipedia. Now, would it also be possible to mark tex text as class="tex", equivalently to what is done for PNG rendering? I suppose the PHP output line should then be: return ('<span class="tex">$ ' . str_replace( "\n", " ", htmlspecialchars( $this->tex ) ) . ' $</span>'); Thanks again!
Right. Done in r65160.