Last modified: 2010-04-17 15:32:09 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T25190, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 23190 - html tags in text $ latex statements $
html tags in text $ latex statements $
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 22818 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-04-14 10:21 UTC by mimamer
Modified: 2010-04-17 15:32 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description mimamer 2010-04-14 10:21:58 UTC
Occasionally, mediawiki places html tags inside of latex statements in text-only render mode, as examplified on http://en.wikipedia.org/wiki/Riemann_hypothesis. This prohibits automatic parsing of latex statements, and creates a weird formatting both when copy&pasting and in page display ($ sign indented on first line, latex statement on next line unindented). I would very much appreciate if mediawiki could be updated in this regard. Thank you in advance.

As a side question related to automated parsing, would it be possible to place latex statements inside of html span tags with class="tex" descriptors, equivalently to what is done for img tags in PNG rendering?
Comment 1 Derk-Jan Hartman 2010-04-14 12:32:26 UTC
Can you be more specific about where the problem can be found ?
Comment 2 Platonides 2010-04-14 12:47:45 UTC
The only "html" i find there is like <img class="tex" alt="\log g(n) &lt; \sqrt{\operatorname{Li}^{-1}(n)}" src="http://upload.wikimedia.org/math/5/e/d/5edf1bc3778b2456213d7857b1f82f80.png" />

The content of that alternate text is '\log g(n) < \sqrt{\operatorname{Li}^{-1}(n)}', showing some characters as entities is needed on html. You should unentity any text you extract directly. Any html/xml parser will do it for you.

I have no problem copying and pasting that, btw.
Comment 3 mimamer 2010-04-14 12:59:07 UTC
I was referring to the text-only render mode, i.e., if you go to preferences and select option "Leave it as TeX (for text browsers)" in wikipedia.

You will see that the Riemann zeta function in the first section is written like this in html:

<dd>$</dd>
</dl>
<p>\zeta(s) = \sum_{n=1}^\infty \frac{1}{n^s} = \frac{1}{1^s} + \frac{1}{2^s} + \frac{1}{3^s} + \cdots. \! $
Comment 4 Derk-Jan Hartman 2010-04-14 13:09:34 UTC
issue confirmed.

The math tag in question contains a newline at the start. In Math.php this is output as:
return ('$ '.htmlspecialchars( $this->tex ).' $');

So the indentation, $  + newline + tex. + our weird code to parse indentation and lists creates this output.
Related to bug 22818, which is the same for a different mode.
Comment 5 Platonides 2010-04-14 21:23:05 UTC
*** Bug 22818 has been marked as a duplicate of this bug. ***
Comment 6 Platonides 2010-04-14 21:23:43 UTC
Fixed in r65039.
Comment 7 mimamer 2010-04-16 06:25:07 UTC
Awesome, thanks! I'm looking forward to that update on wikipedia.

Now, would it also be possible to mark tex text as class="tex", equivalently to what is done for PNG rendering?

I suppose the PHP output line should then be:
return ('<span class="tex">$ ' . str_replace( "\n", " ", htmlspecialchars( $this->tex ) ) . ' $</span>');

Thanks again!
Comment 8 Platonides 2010-04-17 15:32:09 UTC
Right. Done in r65160.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links