Last modified: 2012-07-12 05:34:04 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T38059, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 36059 - HTML tags shouldn't be escaped before passing to MathJax
HTML tags shouldn't be escaped before passing to MathJax
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
Math (Other open bugs)
unspecified
All All
: High critical with 6 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 36485 36491 36842 36977 (view as bug list)
Depends on:
Blocks: 31406 make-mathjax-default
  Show dependency treegraph
 
Reported: 2012-04-18 08:33 UTC by Liangent
Modified: 2012-07-12 05:34 UTC (History)
16 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Liangent 2012-04-18 08:33:31 UTC
Or "&" is rendered as "amp;", "<" as "lt;" and ">" as "gt;".
Comment 1 Liangent 2012-04-18 08:36:49 UTC
(In reply to comment #0)
> Or "&" is rendered as "amp;", "<" as "lt;" and ">" as "gt;".

Should be "If '&' is expected there, '&' is eaten and the rest is shown as plain text. Otherwise it triggers an error and the whole line is not rendered."
Comment 2 Michael M. 2012-04-18 09:46:05 UTC
Reproducible on mw.org, but not on http://leuksman.com/mw
On leuksman.com <math>a<b</math> renders correctly, while on mw.org a box with a&lt;b (or \displaystyle a&lt;b, depending on where you insert the formula) is displayed.
Comment 3 Liangent 2012-04-18 09:52:36 UTC
(In reply to comment #2)
> Reproducible on mw.org, but not on http://leuksman.com/mw
> On leuksman.com <math>a<b</math> renders correctly, while on mw.org a box with
> a&lt;b (or \displaystyle a&lt;b, depending on where you insert the formula) is
> displayed.

Guessing it's related to Tidy.
Comment 4 Liangent 2012-04-18 09:55:52 UTC
(In reply to comment #2)
> Reproducible on mw.org, but not on http://leuksman.com/mw
> On leuksman.com <math>a<b</math> renders correctly, while on mw.org a box with
> a&lt;b (or \displaystyle a&lt;b, depending on where you insert the formula) is
> displayed.

By the way, on leuksman.com, before the MathJax transformation is applied, raw HTML (with escaped &lt; stuff) is displayed making it not so pretty.
Comment 5 nageh 2012-04-18 10:45:53 UTC
Come on. This has been resolved over two years ago in my mathJax user script (en.wikipedia.org/wiki/User:Nageh/mathJax). Why does it reappear in the MediaWiki code? I wished the devs would give a little bit more feedback about what they are doing when they are reusing my code but then cut away stuff out of what seems pure ignorance. If you want to try a working MathJax implementation, try my user script.
Comment 6 Derk-Jan Hartman 2012-04-18 18:59:44 UTC
@Nageh, please do not jump to conclusions that quick. You might not have realized it, but looking into it, it seems your script was actually creating invalid HTML for the "math/tex" script element. The content is not HTML escaped.

The Wikipedia developers (in this case, brion and 2 volunteer patch contributors (one of them me)) are accustomed to writing certain things in certain ways. Apparently one of the developers added escaping to the HTML element creation, because that's what he's used to doing. That has brought forward that actually all the time the script element was not properly created and read out again. Now it's written correctly, but of course reading has broken, which is why this ticket was filed. Stuff like that happens, it's just part of the development cycle.


From en.wikipedia with Nageh MathJax script:
<script type="math/tex" id="MathJax-Element-408">\displaystyle  u'' + p(x)u' + q(x)u=f(x),\quad x>a </script>


Should be:
<script type="math/tex" id="MathJax-Element-408">\displaystyle  u'' + p(x)u' + q(x)u=f(x),\quad x&gt;a </script>
Comment 7 nageh 2012-04-19 16:10:59 UTC
Fair enough, and obviously my criticism was unwarranted. Sorry for my attack.

At the same time this is actually something that MathJax should be expected to take care of. Even in the standard installation of MathJax, no matter whether you write < or &lt; the symbol will be put unescaped into the script element. I will report this as a bug to the MathJax devs.
Comment 8 nageh 2012-04-19 21:58:47 UTC
I really should have thought further before posting my last message. There is nothing wrong with leaving the <, >, and & symbols un-escaped because the maths text will be added as a text node(!) to the DOM, and thus will NOT be interpreted by the HTML parser and will NOT create invalid HTML, and my script was NOT broken. ;)
Comment 9 nageh 2012-04-23 20:35:13 UTC
As I said, comments by others are always being ignored. Now 1.20wmf1 has been deployed on the English Wikipedia, and all the TeX code is broken. < gets mangled to &lt; and then mangled again to &amp;lt;. Sigh.
Comment 10 MZMcBride 2012-04-24 19:50:30 UTC
(In reply to comment #9)
> As I said, comments by others are always being ignored. Now 1.20wmf1 has been
> deployed on the English Wikipedia, and all the TeX code is broken. < gets
> mangled to &lt; and then mangled again to &amp;lt;. Sigh.

This sounds rather serious. Do you have a link to an example of such breakage?
Comment 11 nageh 2012-04-25 19:11:43 UTC
Just selected "Leave it as TeX" in the Preferences->Appearance menu. Then open any page that includes TeX code with any of <, >, or &, and view the source (or the text that is displayed). Example page: http://en.wikipedia.org/wiki/Decimal_representation . I have implemented a work-around for the mathJax user script for the moment.
Comment 12 MZMcBride 2012-04-26 03:53:14 UTC
(In reply to comment #11)
> Just selected "Leave it as TeX" in the Preferences->Appearance menu. Then open
> any page that includes TeX code with any of <, >, or &, and view the source (or
> the text that is displayed). Example page:
> http://en.wikipedia.org/wiki/Decimal_representation . I have implemented a
> work-around for the mathJax user script for the moment.

Ah, that makes more sense. This is only a problem for people with that particular user preference set. So instead of seeing (for example) "$ r_n\leq x &lt; r_n+\frac{1}{10^n}.\, $", people should be seeing "$ r_n\leq x < r_n+\frac{1}{10^n}.\, $"? Is that correct?

This alone wouldn't be a high priority. Are you saying that the escaping is also messing up the MathJax user gadget?
Comment 13 nageh 2012-04-26 11:45:48 UTC
Yes, that is correct. Take a look at the HTML source and you'll notice that the reason you see &lt; is because the ampersand is encoded as &amp;, which is followed by "lt;" as a normal text. However, the HTML source should simply contain &lt; which would get rendered as <.

Also, this change has only been introduced with 1.20wmf1 so it's a quite disappointing to hear that reverting this flawed change isn't a high priority.

The change was also messing up the mathJax user script, but I have implemented a work-around. So whether you decide to fix this or not, I don't know, but I'm not the only one to note that the MediaWiki devs community is pretty remote from its users (see comments at sections http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#New_.22diff.22_view_is_horrible_and_illegible and http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Change_notifcations_to_editors ).
Comment 14 Richard Morris 2012-05-03 17:11:34 UTC
I've now switched to the experimental mathJax and this bug shows itself in the differential equations example in Help:Formula http://en.wikipedia.org/wiki/Help:Formula#Differential_equation. Rather than the correct output its displaying
    \displaystyle  u'' + p(x)u' + q(x)u=f(x),\quad x&gt;a
with the &gt; instead of >.
Comment 15 Richard Morris 2012-05-03 17:12:55 UTC
Add to block list for 31406 as Help:Formula broken.
Comment 16 Jon Harald Søby 2012-05-03 17:17:09 UTC
*** Bug 36485 has been marked as a duplicate of this bug. ***
Comment 17 Richard Morris 2012-05-04 07:56:53 UTC
It also messing with arrays and matrices 
\begin{matrix}
 x & y \\
 z & w
\end{matrix}
This is converted this to &amp; before passing to MathJax so it is rendering as 
 x amp;y
 z amp;w
This is more serious as all matrices, case statements, multiline equations, and tables are broken. See http://en.wikipedia.org/wiki/Help:Formula#Fractions.2C_matrices.2C_multilines for a whole section of broken examples.
Comment 18 Helder 2012-05-04 09:09:07 UTC
In terms of articles containing formulas, I would consider the appropriate "Severity" for this bug is "blocker" for the use of MathJax on Wikipedia, since every formula containing <, >, or & is broken.

Setting the fields accordingly, since this is consistent with the description of the fields at [[mw:Bug management/Bugzilla usage#Priority]]. I would set also "priority=highest" but I'm not sure about the availability of devs for fixing this... So I'll leave it to some developer to check the appropriate priority for this bug.

See also reports on
* [[Wikipedia talk:WikiProject Mathematics#MathJax]]
* [[de:Portal Diskussion:Mathematik#Mathjax wird getestet!]]
Comment 20 Michael M. 2012-05-05 08:28:51 UTC
*** Bug 36491 has been marked as a duplicate of this bug. ***
Comment 21 Michael M. 2012-05-15 08:56:46 UTC
*** Bug 36842 has been marked as a duplicate of this bug. ***
Comment 22 Michael M. 2012-05-21 07:25:30 UTC
*** Bug 36977 has been marked as a duplicate of this bug. ***
Comment 23 Derk-Jan Hartman 2012-06-02 09:52:21 UTC
Fixed the double escaping, but the generated DOM is still unescaped, which I don't really like.

That probably requires changes in MathJax though.
Comment 24 Brion Vibber 2012-06-02 10:00:43 UTC
Changeset link?
Comment 25 Helder 2012-06-04 11:20:57 UTC
I think it is change I6d548d06 (https://gerrit.wikimedia.org/r/#/c/9739/)
Comment 26 Richard Morris 2012-06-04 13:51:43 UTC
Not sure what the desired output for the MW_MATH_SOURCE should be. This will not parsed by MathJax so we end up with illegal and untreated html. Perhaphs better to pass it to htmlspecialchars().
Comment 27 Richard Morris 2012-06-04 15:11:31 UTC
Just noticed that MathJax suports \lt and \gt for < and >. These could solve the problem with < and >.
http://www.onemathematicalcat.org/MathJaxDocumentation/TeXSyntax.htm#L

The & used in matrices and arrays need not be a error
According to http://www.w3.org/TR/html5/tokenization.html#data-state "& " is legal and interpreted as Ampersand then space.

There are a few subtitles which we might need to watch for. \& is a literal ampersand, \>is an	alternate medium space. I don't know if  html entities are allowed but they seem to work x &#x229D; y gives x circled minus y.
Comment 28 nageh 2012-06-08 17:40:48 UTC
The data is content of a script(!) element, of course unescaped entities are legal there! Do you escape your < and > signs in javascript code???
Comment 29 Liangent 2012-06-08 18:12:45 UTC
Is it really so easy to fix? Gerrit change #10708.
Comment 30 Liangent 2012-06-08 18:32:09 UTC
Marking as fixed. There're already two patches which are independently worked out and almost the same. Reopen if they don't fix this issue.
Comment 31 nageh 2012-06-09 11:01:40 UTC
Yes, it is extremely easy to fix. That's why I didn't understand the cold shoulders I got.
Comment 32 Brad Jorsch 2012-06-24 18:03:53 UTC
(In reply to comment #30)
> Marking as fixed. There're already two patches which are independently worked
> out and almost the same. Reopen if they don't fix this issue.

Is it really appropriate to mark it as fixed when the patches haven't been reviewed or merged yet?
Comment 33 Richard Morris 2012-06-24 18:24:23 UTC
I don't think the fix is quite correct as it introduces problems for the source option. What I think you want is:


		if( $this->mode == MW_MATH_SOURCE ) {
			# No need to render or parse anything more!
			# New lines are replaced with spaces, which avoids confusing our parser (bugs 23190, 22818)
			return Xml::element( 'span',
				$this->_attribs(
					'span',
					array(
						'class' => 'tex',
						'dir' => 'ltr'
					)
				),
				'$ ' . str_replace( "\n", " ",  htmlspecialchars($this->tex) ) . ' $'
			);
		}
		if(  $this->mode == MW_MATH_MATHJAX ) {
			# No need to render or parse anything more!
			# New lines are replaced with spaces, which avoids confusing our parser (bugs 23190, 22818)
			return Xml::element( 'span',
				$this->_attribs(
					'span',
					array(
						'class' => 'tex',
						'dir' => 'ltr'
					)
				),
				'$ ' . str_replace( "\n", " ", $this->tex ) . ' $'
			);
		}

You want to  call  htmlspecialchars in MW_MATH_SOURCE but not  MW_MATH_MATHJAX. You might also want to mention the bug in the comments.
Comment 34 Liangent 2012-06-24 18:29:21 UTC
(In reply to comment #33)
> You want to  call  htmlspecialchars in MW_MATH_SOURCE but not  MW_MATH_MATHJAX.

I don't understand why. It will be further escaped by Xml::element(), so you'll see still see double-escaped TeX in source mode.
Comment 35 Richard Morris 2012-06-24 18:45:20 UTC
Good point. Its worth checking we get legal html in source mode when its reviewed.
Comment 36 Liangent 2012-07-12 05:34:04 UTC
already merged

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links