Last modified: 2013-09-04 12:33:20 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 10407 - Ampersand replaced by its HTML entity even in <html> sections, breaking JavaScript
Ampersand replaced by its HTML entity even in <html> sections, breaking JavaS...
Status: NEW
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
1.20.x
All All
: Low normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-06-29 14:09 UTC by Frederic Keller
Modified: 2013-09-04 12:33 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Frederic Keller 2007-06-29 14:09:10 UTC
The issue I am facing is that all pure Ampersand "&" present in a page content are replaced by their HTML entity &amp;

Even when allowing raw HTML, $wgRawHtml=true, and surrounded by HTML tags, <html></html>, the ampersand are replaced.

I would like to keep pure &, because the users should be able to add some Javascript in their pages. But with this replacement the & used as a logic operator is corrupted, and the Javascript as well.

Here is a small content to explain and show the problem. Just add this content to a page and check the source.
---------------------
<html>
ampersand : &amp;
pure ampersand: & (should not be replaced)
</html>
----------------------

Is there a solution to this problem, or will it be fixed in the next version ?

Thank you very much !
Comment 1 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-06-29 17:14:03 UTC
Sure &amp; should work correctly in JavaScript, just as it does in URLs.  The XML parser is supposed to replace it with & before passing it on to the JavaScript parser or anything else.  It really doesn't work?  Try using <![CDATA[ ... ]]> around your JavaScript.
Comment 2 ais523 2007-06-29 18:09:00 UTC
(In reply to comment #1)
> Sure &amp; should work correctly in JavaScript, just as it does in URLs.  The
> XML parser is supposed to replace it with & before passing it on to the
> JavaScript parser or anything else.  It really doesn't work?  Try using
> <![CDATA[ ... ]]> around your JavaScript.

I can confirm that this doesn't work correctly (tested on a private wiki with the <html> tag enabled, MediaWiki version 1.9.0).

Test code:
<html>
<script>
// <[CDATA[
alert("Testing the & sign");
// ]]>
</script>
</html>

Output in page's HTML:
<script>
// <[CDATA[
alert("Testing the &amp; sign");
// ]]>
</script>

and the script displays the message Testing the &amp; sign in the alert box that comes up. I've actually written scripts inside HTML tags on that wiki, and it's been a pain having to express a&&b as !(!a||!b)...
Comment 3 Brion Vibber 2007-06-29 18:19:44 UTC
The <![CDATA[ ... ]]> would be used to allow a raw & in the source to pass the XML parser correctly. It would ensure that &amp; is interpreted as &amp; *instead of* the & it otherwise would be.

Note that in HTML 4, <script> contents are defined as CDATA already, which is why common browsers already handle it that way. In pure XHTML, this wouldn't be implied, which is why we make it explicit in our own output.

Since various processing is done on the output code even after the <html> sections are done, currently it may not be possible to get nice 'clean' output of this sort.
Comment 4 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-06-29 18:26:28 UTC
(mid-air collision)

Right, right, of course <![CDATA[ will just muck things up further, the MW parser doesn't recognize it.  But it's an error to output a literal &, in <script> or elsewhere, that doesn't begin an entity.  It should work correctly as &amp;, as far as I can tell.

Unfortunately, testing in Firefox, it does not.  <![CDATA[ seems to be the only way to get this to work, so for this to function correctly MediaWiki would have to either insert <![CDATA[ ... ]]> intelligently inside <script> and maybe <style> tags, and not HTML-escape those; or else just not clean them at all.

Is it Tidy doing the cleaning, or the Sanitizer?  Does the entity get replaced even with Tidy off?
Comment 5 Robert 2009-07-29 04:09:19 UTC
On http://en.wikipedia.org/wiki/Special:Watchlist, I get a javascript error every time I refresh due specifically to this bug. The following line is in the header section

<script type="text/javascript" src="http://en.wikipedia.org/w/index.php?title=-&amp;action=raw&amp;gen=js&amp;useskin=monobook"><!-- site js --></script>

Because the ampersands are not handled correctly, that line returns an html text page instead of the expected javascript.

This has just started happening in the last day or so.
Comment 6 Splarka 2009-07-29 04:21:45 UTC
(In reply to comment #5)
> On http://en.wikipedia.org/wiki/Special:Watchlist, I get a javascript error
> every time I refresh due specifically to this bug. The following line is in the
> header section
> 
> <script type="text/javascript"
> src="http://en.wikipedia.org/w/index.php?title=-&amp;action=raw&amp;gen=js&amp;useskin=monobook"><!--
> site js --></script>
> 
> Because the ampersands are not handled correctly, that line returns an html
> text page instead of the expected javascript.
> 
> This has just started happening in the last day or so.
> 

No, that's normal, this bug is about & being replaced with &amp; in the <script> body, not in the src parameter value. For example:
<html><script type="text/javascript">if(skin && stylepath) alert('woo')</script></html> 
will break, whereas 
<html><script type="text/javascript" src="http://en.wikipedia.org/w/index.php?title=MediaWiki:Common.js/watchlist.js&action=raw&ctype=text/javascript"></script></html> 
will correctly escape the & to &amp; and the browser expects and understands this.

What error are you getting exactly? that gen=js appears on every page load, not just watchlists, and is what loads MediaWiki:Common.js and MediaWiki:SKINNAME.js (probably Monobook). http://en.wikipedia.org/wiki/MediaWiki:Common.js/watchlist.js is loaded just on the watchlist page, so possibly an error there.
Comment 7 Robert 2009-07-29 05:32:15 UTC
The exact error is

Line: 8
Char: 2
Error: Expected identifier, string or number
Code: 0
URL: http://en.wikipedia.org/wiki/Special:Watchlist

I got to the ampersands by saving the html locally and debugging one line at a time. However, I checked (as suggested) and other pages which do not produce errors have the same line. I tried a more complete test (I left in all the code). Running from my hard drive, I first converted the relative links to absolute links. Now, there are 2 errors. Commenting out the line I indicated above stopped them both. However, the errors are different when running locally, so it appears that I was wrong. 

Line: 2
Char: 1
Error: invalid character
Code: 0
URL: file://path to my test case

BTW, I am running IE 6. 
Comment 8 Robert 2009-08-04 18:48:16 UTC
(In reply to comment #7)
> The exact error is
> Line: 8
> Char: 2
> Error: Expected identifier, string or number
> Code: 0
> URL: http://en.wikipedia.org/wiki/Special:Watchlist
> I got to the ampersands by saving the html locally and debugging one line at a
> time. However, I checked (as suggested) and other pages which do not produce
> errors have the same line. I tried a more complete test (I left in all the
> code). Running from my hard drive, I first converted the relative links to
> absolute links. Now, there are 2 errors. Commenting out the line I indicated
> above stopped them both. However, the errors are different when running
> locally, so it appears that I was wrong. 
> Line: 2
> Char: 1
> Error: invalid character
> Code: 0
> URL: file://path to my test case
> BTW, I am running IE 6. 

The problem went away today at 2pm, last saw the problem at 6am.
Comment 9 Cbarr 2011-08-11 16:01:17 UTC
I have hit this bug on wikimediafoundation.org. I was trying to use logical and "&&" in my javascript and the parser changed both giving "&amp;&amp;".
Comment 10 Krinkle 2012-08-09 11:33:06 UTC
Bump, caused in issue again on wikimediafoundation.org. Makes code really annoying to write.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links