# Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 36059 - HTML tags shouldn't be escaped before passing to MathJax
 Summary: HTML tags shouldn't be escaped before passing to MathJax
 Status: Product: RESOLVED FIXED MediaWiki extensions Unclassified Component: Math (Other open bugs) unspecified All All High critical with 6 votes (vote) --- Nobody - You can work on this! (view as bug list) 31406 make-mathjax-default Show dependency tree / graph

 Reported: 2012-04-18 08:33 UTC by Liangent 2012-07-12 05:34 UTC (History) 16 users (show) b bjorsch brion chrkipping collinstocks erik hartman.wiki he7d3r+bugs listenleser mah mal.malego nsoranzo rich rkaldari sandrobt.wiki tobias.oelgarte --- --- ---

Attachments

 Liangent 2012-04-18 08:33:31 UTC Or "&" is rendered as "amp;", "<" as "lt;" and ">" as "gt;". Liangent 2012-04-18 08:36:49 UTC (In reply to comment #0) > Or "&" is rendered as "amp;", "<" as "lt;" and ">" as "gt;". Should be "If '&' is expected there, '&' is eaten and the rest is shown as plain text. Otherwise it triggers an error and the whole line is not rendered." Michael M. 2012-04-18 09:46:05 UTC Reproducible on mw.org, but not on http://leuksman.com/mw On leuksman.com [itex]a Reproducible on mw.org, but not on http://leuksman.com/mw > On leuksman.com [itex]a a<b (or \displaystyle a<b, depending on where you insert the formula) is > displayed. Guessing it's related to Tidy. Liangent 2012-04-18 09:55:52 UTC (In reply to comment #2) > Reproducible on mw.org, but not on http://leuksman.com/mw > On leuksman.com [itex]a a<b (or \displaystyle a<b, depending on where you insert the formula) is > displayed. By the way, on leuksman.com, before the MathJax transformation is applied, raw HTML (with escaped < stuff) is displayed making it not so pretty. nageh 2012-04-18 10:45:53 UTC Come on. This has been resolved over two years ago in my mathJax user script (en.wikipedia.org/wiki/User:Nageh/mathJax). Why does it reappear in the MediaWiki code? I wished the devs would give a little bit more feedback about what they are doing when they are reusing my code but then cut away stuff out of what seems pure ignorance. If you want to try a working MathJax implementation, try my user script. Derk-Jan Hartman 2012-04-18 18:59:44 UTC @Nageh, please do not jump to conclusions that quick. You might not have realized it, but looking into it, it seems your script was actually creating invalid HTML for the "math/tex" script element. The content is not HTML escaped. The Wikipedia developers (in this case, brion and 2 volunteer patch contributors (one of them me)) are accustomed to writing certain things in certain ways. Apparently one of the developers added escaping to the HTML element creation, because that's what he's used to doing. That has brought forward that actually all the time the script element was not properly created and read out again. Now it's written correctly, but of course reading has broken, which is why this ticket was filed. Stuff like that happens, it's just part of the development cycle. From en.wikipedia with Nageh MathJax script: Should be:  nageh 2012-04-19 16:10:59 UTC Fair enough, and obviously my criticism was unwarranted. Sorry for my attack. At the same time this is actually something that MathJax should be expected to take care of. Even in the standard installation of MathJax, no matter whether you write < or < the symbol will be put unescaped into the script element. I will report this as a bug to the MathJax devs. nageh 2012-04-19 21:58:47 UTC I really should have thought further before posting my last message. There is nothing wrong with leaving the <, >, and & symbols un-escaped because the maths text will be added as a text node(!) to the DOM, and thus will NOT be interpreted by the HTML parser and will NOT create invalid HTML, and my script was NOT broken. ;) nageh 2012-04-23 20:35:13 UTC As I said, comments by others are always being ignored. Now 1.20wmf1 has been deployed on the English Wikipedia, and all the TeX code is broken. < gets mangled to < and then mangled again to &lt;. Sigh. MZMcBride 2012-04-24 19:50:30 UTC (In reply to comment #9) > As I said, comments by others are always being ignored. Now 1.20wmf1 has been > deployed on the English Wikipedia, and all the TeX code is broken. < gets > mangled to < and then mangled again to &lt;. Sigh. This sounds rather serious. Do you have a link to an example of such breakage? nageh 2012-04-25 19:11:43 UTC Just selected "Leave it as TeX" in the Preferences->Appearance menu. Then open any page that includes TeX code with any of <, >, or &, and view the source (or the text that is displayed). Example page: http://en.wikipedia.org/wiki/Decimal_representation . I have implemented a work-around for the mathJax user script for the moment. MZMcBride 2012-04-26 03:53:14 UTC (In reply to comment #11) > Just selected "Leave it as TeX" in the Preferences->Appearance menu. Then open > any page that includes TeX code with any of <, >, or &, and view the source (or > the text that is displayed). Example page: > http://en.wikipedia.org/wiki/Decimal_representation . I have implemented a > work-around for the mathJax user script for the moment. Ah, that makes more sense. This is only a problem for people with that particular user preference set. So instead of seeing (for example) "$r_n\leq x < r_n+\frac{1}{10^n}.\,$", people should be seeing "$r_n\leq x < r_n+\frac{1}{10^n}.\,$"? Is that correct? This alone wouldn't be a high priority. Are you saying that the escaping is also messing up the MathJax user gadget? nageh 2012-04-26 11:45:48 UTC Yes, that is correct. Take a look at the HTML source and you'll notice that the reason you see < is because the ampersand is encoded as &, which is followed by "lt;" as a normal text. However, the HTML source should simply contain < which would get rendered as <. Also, this change has only been introduced with 1.20wmf1 so it's a quite disappointing to hear that reverting this flawed change isn't a high priority. The change was also messing up the mathJax user script, but I have implemented a work-around. So whether you decide to fix this or not, I don't know, but I'm not the only one to note that the MediaWiki devs community is pretty remote from its users (see comments at sections http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#New_.22diff.22_view_is_horrible_and_illegible and http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29#Change_notifcations_to_editors ). Richard Morris 2012-05-03 17:11:34 UTC I've now switched to the experimental mathJax and this bug shows itself in the differential equations example in Help:Formula http://en.wikipedia.org/wiki/Help:Formula#Differential_equation. Rather than the correct output its displaying \displaystyle u'' + p(x)u' + q(x)u=f(x),\quad x>a with the > instead of >. Richard Morris 2012-05-03 17:12:55 UTC Add to block list for 31406 as Help:Formula broken. Jon Harald Søby 2012-05-03 17:17:09 UTC *** Bug 36485 has been marked as a duplicate of this bug. *** Richard Morris 2012-05-04 07:56:53 UTC It also messing with arrays and matrices \begin{matrix} x & y \\ z & w \end{matrix} This is converted this to & before passing to MathJax so it is rendering as x amp;y z amp;w This is more serious as all matrices, case statements, multiline equations, and tables are broken. See http://en.wikipedia.org/wiki/Help:Formula#Fractions.2C_matrices.2C_multilines for a whole section of broken examples. Helder 2012-05-04 09:09:07 UTC In terms of articles containing formulas, I would consider the appropriate "Severity" for this bug is "blocker" for the use of MathJax on Wikipedia, since every formula containing <, >, or & is broken. Setting the fields accordingly, since this is consistent with the description of the fields at [[mw:Bug management/Bugzilla usage#Priority]]. I would set also "priority=highest" but I'm not sure about the availability of devs for fixing this... So I'll leave it to some developer to check the appropriate priority for this bug. See also reports on * [[Wikipedia talk:WikiProject Mathematics#MathJax]] * [[de:Portal Diskussion:Mathematik#Mathjax wird getestet!]] Helder 2012-05-04 09:16:27 UTC (In reply to comment #18) > See also reports on > * [[Wikipedia talk:WikiProject Mathematics#MathJax]] > * [[de:Portal Diskussion:Mathematik#Mathjax wird getestet!]] and https://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(technical)&oldid=490606575#TeX_broken Michael M. 2012-05-05 08:28:51 UTC *** Bug 36491 has been marked as a duplicate of this bug. *** Michael M. 2012-05-15 08:56:46 UTC *** Bug 36842 has been marked as a duplicate of this bug. *** Michael M. 2012-05-21 07:25:30 UTC *** Bug 36977 has been marked as a duplicate of this bug. *** Derk-Jan Hartman 2012-06-02 09:52:21 UTC Fixed the double escaping, but the generated DOM is still unescaped, which I don't really like. That probably requires changes in MathJax though. Brion Vibber 2012-06-02 10:00:43 UTC Changeset link? Helder 2012-06-04 11:20:57 UTC I think it is change I6d548d06 (https://gerrit.wikimedia.org/r/#/c/9739/) Richard Morris 2012-06-04 13:51:43 UTC Not sure what the desired output for the MW_MATH_SOURCE should be. This will not parsed by MathJax so we end up with illegal and untreated html. Perhaphs better to pass it to htmlspecialchars(). Richard Morris 2012-06-04 15:11:31 UTC Just noticed that MathJax suports \lt and \gt for < and >. These could solve the problem with < and >. http://www.onemathematicalcat.org/MathJaxDocumentation/TeXSyntax.htm#L The & used in matrices and arrays need not be a error According to http://www.w3.org/TR/html5/tokenization.html#data-state "& " is legal and interpreted as Ampersand then space. There are a few subtitles which we might need to watch for. \& is a literal ampersand, \>is an alternate medium space. I don't know if html entities are allowed but they seem to work x ⊝ y gives x circled minus y. nageh 2012-06-08 17:40:48 UTC The data is content of a script(!) element, of course unescaped entities are legal there! Do you escape your < and > signs in javascript code??? Liangent 2012-06-08 18:12:45 UTC Is it really so easy to fix? Gerrit change #10708. Liangent 2012-06-08 18:32:09 UTC Marking as fixed. There're already two patches which are independently worked out and almost the same. Reopen if they don't fix this issue. nageh 2012-06-09 11:01:40 UTC Yes, it is extremely easy to fix. That's why I didn't understand the cold shoulders I got. Brad Jorsch 2012-06-24 18:03:53 UTC (In reply to comment #30) > Marking as fixed. There're already two patches which are independently worked > out and almost the same. Reopen if they don't fix this issue. Is it really appropriate to mark it as fixed when the patches haven't been reviewed or merged yet? Richard Morris 2012-06-24 18:24:23 UTC I don't think the fix is quite correct as it introduces problems for the source option. What I think you want is: if( $this->mode == MW_MATH_SOURCE ) { # No need to render or parse anything more! # New lines are replaced with spaces, which avoids confusing our parser (bugs 23190, 22818) return Xml::element( 'span',$this->_attribs( 'span', array( 'class' => 'tex', 'dir' => 'ltr' ) ), '$' . str_replace( "\n", " ", htmlspecialchars($this->tex) ) . ' $' ); } if($this->mode == MW_MATH_MATHJAX ) { # No need to render or parse anything more! # New lines are replaced with spaces, which avoids confusing our parser (bugs 23190, 22818) return Xml::element( 'span', $this->_attribs( 'span', array( 'class' => 'tex', 'dir' => 'ltr' ) ), '$ ' . str_replace( "\n", " ", $this->tex ) . '$' ); } You want to call htmlspecialchars in MW_MATH_SOURCE but not MW_MATH_MATHJAX. You might also want to mention the bug in the comments. Liangent 2012-06-24 18:29:21 UTC (In reply to comment #33) > You want to call htmlspecialchars in MW_MATH_SOURCE but not MW_MATH_MATHJAX. I don't understand why. It will be further escaped by Xml::element(), so you'll see still see double-escaped TeX in source mode. Richard Morris 2012-06-24 18:45:20 UTC Good point. Its worth checking we get legal html in source mode when its reviewed. Liangent 2012-07-12 05:34:04 UTC already merged

 Note You need to log in before you can comment on or make changes to this bug.