Last modified: 2014-11-17 10:36:20 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T29528, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 27528 - JS minification still outputs incorrect javascript for some input
JS minification still outputs incorrect javascript for some input
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
ResourceLoader (Other open bugs)
1.17.x
All All
: High normal with 3 votes (vote)
: ---
Assigned To: Trevor Parscal
:
Depends on: 27395
Blocks: 26676
  Show dependency treegraph
 
Reported: 2011-02-18 09:44 UTC by Michael M.
Modified: 2014-11-17 10:36 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
New minifier which takes the division/regexp conflict into account (25.25 KB, patch)
2011-02-21 18:15 UTC, P.Copp
Details
Modified patch: Take html comments into account, cleaned up a bit (25.74 KB, patch)
2011-02-24 15:15 UTC, P.Copp
Details
List of enwiki user scripts which are currently broken (2.24 KB, text/plain)
2011-02-24 15:16 UTC, P.Copp
Details
Follow-up fix: don't insert newlines before increment operators (1.61 KB, patch)
2011-03-14 15:56 UTC, P.Copp
Details

Description Michael M. 2011-02-18 09:44:50 UTC
Example:
Input: http://de.wikipedia.org/wiki/Benutzer:Tom_md/monobook.js
Output: http://bits.wikimedia.org/de.wikipedia.org/load.php?debug=false&lang=de&modules=user&only=scripts&skin=monobook&user=Tom_md

Right at the beginning there are some comments that are not deleted (this doesn't hurt, of course, but obviously something goes wrong).

Around line 435 (var Quickbar) everything seems fine again, but on line 473 (in function qbWPIntern()) some comments are preserved and then in line 609
w (1,'http://commons.wikimedia.org/wiki/Special:Upload','C-Up',qbtarget,'Commons-Upload');
(note that the quotes change from double to single here)
is destroyed to
w (1,'http:
which causes a syntax error.
Please note that this javascript is the most used javascript in the German Wikipedia.
Comment 1 Derk-Jan Hartman 2011-02-18 09:46:48 UTC
http://bits.wikimedia.org/it.wikipedia.org/load.php?debug=false&lang=it&modules=site&only=scripts&skin=vector&version=20110218T093055Z

search for aNode.setAttribute the comment stripper breaks an url because the url contains //

compare with http://it.wikipedia.org/wiki/MediaWiki:Vector.js original
Comment 2 Salvatore Ingala 2011-02-18 10:39:32 UTC
In the previous report, please refer to http://it.wikipedia.org/w/index.php?title=MediaWiki:Vector.js&oldid=38576973 since code has been modified.
Comment 3 P.Copp 2011-02-18 13:53:51 UTC
When JavaScriptDistiller will be rewritten, please note that it's not trivial to safely minify js when allowing arbitrary user input. That's because of the division/regex conflict in javascript's grammar, which can not be resolved only by looking at the previous token:

(a+b)/(c+d)/g    // division operator
if(1)/a /g.exec('Pa ss'); //regex

Althought this might seem to be edge cases, considering the amount of javascript being used on the different projects, it is likely that they will be hit eventually.

So to do minifying without breaking correct javascript, you'll need to do basic stack parsing on the javascript. I've written sth. similar in C a while ago, so if someone is willing to review or use it, I can try to port it to php and post it here.
Comment 4 Roan Kattouw 2011-02-18 17:23:16 UTC
This should be fixed now. Closing as FIXED, please REOPEN if new problems arise.
Comment 5 P.Copp 2011-02-18 17:49:26 UTC
Reopening per my comment above. As pointed out, it's not possible to minify js with simple regexes without breaking user code in some cases.

FTR here are two sample test cases that currently break:

> echo JavascriptDistiller::stripWhiteSpace("alert( (10+10) / '/'.charCodeAt( 0 ) ) + '//' );\n");
alert((10+10)/ '/'.charCodeAt( 0 ) ) + '

> echo JavascriptDistiller::stripWhiteSpace("if(1)/a /g.exec('Pa ss');");
if(1)/a/g.exec('Pa ss');
Comment 6 Roan Kattouw 2011-02-19 12:32:32 UTC
(In reply to comment #5)
> Reopening per my comment above. As pointed out, it's not possible to minify js
> with simple regexes without breaking user code in some cases.
> 
> FTR here are two sample test cases that currently break:
> 
> > echo JavascriptDistiller::stripWhiteSpace("alert( (10+10) / '/'.charCodeAt( 0 ) ) + '//' );\n");
> alert((10+10)/ '/'.charCodeAt( 0 ) ) + '
> 
This has unbalanced parentheses, but the issue you point our (division interpreted as start of regex, string vs. not string swapped for the rest of the input) is valid.

> > echo JavascriptDistiller::stripWhiteSpace("if(1)/a /g.exec('Pa ss');");
> if(1)/a/g.exec('Pa ss');
This one is so devious that even Douglas Crockford's JSMin falls for it :O

(In reply to comment #3)
> So to do minifying without breaking correct javascript, you'll need to do basic
> stack parsing on the javascript. I've written sth. similar in C a while ago, so
> if someone is willing to review or use it, I can try to port it to php and post
> it here.
I'm interested. I'm also willing to port it to PHP myself if you give me the C code and release it under a free license.
Comment 7 P.Copp 2011-02-21 18:15:02 UTC
Created attachment 8185 [details]
New minifier which takes the division/regexp conflict into account

(In reply to comment #6)
> This has unbalanced parentheses, but the issue you point our (division
Copy'n'Paste fail. Sorry for that :)

> This one is so devious that even Douglas Crockford's JSMin falls for it :O
> 
Right, JSMin makes a fair amount of assumptions about the code that it will minify.

> I'm interested. I'm also willing to port it to PHP myself if you give me the C
> code and release it under a free license.
Well, usually it's easier to port code that you wrote yourself, so I went ahead and made the attached patch. It seems, that even in php, its performance is quite good, in most tested cases it was about the same speed or slightly faster than JavascriptDistiller.

For now I removed $wgResourceLoaderMinifyJSVerticalSpace. As the new minifier knows when semicolon insertion will be triggered, it can safely put everything on a single line. For readability it may be considered to add a few newlines, though, maybe before each "function"?

I left JavaScriptDistiller.php in place to ease comparing and testing. I also added a few test cases in tests/phpunit/includes/libs/JavaScriptMinifierTest.php, probably need a few more, though.
Comment 8 P.Copp 2011-02-24 15:15:26 UTC
Created attachment 8207 [details]
Modified patch: Take html comments into account, cleaned up a bit

So, poking a little bit more at this I fired up a test script that pulls user scripts from enwiki and checks them against JSLint <http://www.javascriptlint.com/> for syntax errors, both before and after minifying.
After I had discovered and fixed an issue with html comments (see modified patch), I ran the script again and got the results below:


                    | Total | Distiller |    Distiller    |  Minifier
                    |       |           | (stripVertical) |
--------------------+-------+-----------+-----------------+------------
Checked pages       | 39313 |           |                 |
--------------------+-------+-----------+-----------------+------------
Error free pages    | 34623 |           |                 |
--------------------+-------+-----------+-----------------+------------
Pages with syntax   |       |           |                 |
errors after minify |       | 61        |  79             |   0
--------------------+-------+-----------+-----------------+------------
Avg. length (bytes) | 2875  | 2124      |  2085           |   2065
--------------------+-------+-----------+-----------------+------------
Avg. time (ms)      |       | 7.0       |  8.9            |   5.0
Comment 9 P.Copp 2011-02-24 15:16:40 UTC
Created attachment 8208 [details]
List of enwiki user scripts which are currently broken
Comment 10 Michael M. 2011-02-25 09:42:11 UTC
I had a quick look at some of the broken scripts and all had a /\/foo\// in them. Especially in [[User:MBisanz/monobookrfa.js]] there isn't much other code that could break. I didn't check if that's the reason why they break, but if so that must be fixed of course.
Comment 11 Roan Kattouw 2011-03-14 10:08:58 UTC
(In reply to comment #7)
> For now I removed $wgResourceLoaderMinifyJSVerticalSpace. As the new minifier
> knows when semicolon insertion will be triggered, it can safely put everything
> on a single line. For readability it may be considered to add a few newlines,
> though, maybe before each "function"?
> 
I'd rather keep the existing behavior of putting each statement on its own line.

Thanks for running these tests and for contributing your minifier. I'm reviewing it now.
Comment 12 Roan Kattouw 2011-03-14 13:25:29 UTC
Patch committed in r83885, some tweaks made in r83891
Comment 13 Niklas Laxström 2011-03-14 13:33:19 UTC
Where are the tests?
Comment 14 Roan Kattouw 2011-03-14 13:38:29 UTC
(In reply to comment #13)
> Where are the tests?

# /trunk/phase3/tests/phpunit/includes/libs/JavaScriptMinifierTest.php (added)

in r83885
Comment 15 P.Copp 2011-03-14 15:56:12 UTC
Created attachment 8290 [details]
Follow-up fix: don't insert newlines before increment operators

(In reply to comment #12)
> Patch committed in r83885, some tweaks made in r83891

Thanks a lot. I really appreciate this being included in MediaWiki.

Just a small fix concerning the addition of the $maxLineLength parameter: There is one case in the js grammar, where a newline is forbidden, although semicolon insertion is not triggered:

if(x
++);

This raises a syntax error, because a newline may not occur before a postfix increment operator. So we have to check for that too, when inserting a newline.

Fix with regression tests attached.
Comment 16 Roan Kattouw 2011-03-14 18:05:24 UTC
(In reply to comment #15)
> Thanks a lot. I really appreciate this being included in MediaWiki.
> 
You should apply for commit access :)

> Just a small fix concerning the addition of the $maxLineLength parameter: There
> is one case in the js grammar, where a newline is forbidden, although semicolon
> insertion is not triggered:
> 
> if(x
> ++);
> 
> This raises a syntax error, because a newline may not occur before a postfix
> increment operator. So we have to check for that too, when inserting a newline.
> 
> Fix with regression tests attached.
Thanks! Applied in r83934 with a small comment tweak.
Comment 17 Neil Kandalgaonkar 2011-03-15 16:19:23 UTC
FYI, \f and \v aren't valid escapes in PHP < 5.2.5.

We run PHP 5.2.4 in production. So, it's fixed in our code to \xb and \xc. 

http://www.mediawiki.org/w/index.php?title=Special:Code/MediaWiki/83998
http://www.mediawiki.org/w/index.php?title=Special:Code/MediaWiki/83997

P.Copp -- you may want to backport that, or make note of a minimum PHP version to run.
Comment 18 P.Copp 2011-03-26 15:33:27 UTC
(In reply to comment #17)
> FYI, \f and \v aren't valid escapes in PHP < 5.2.5.
> 
> We run PHP 5.2.4 in production. So, it's fixed in our code to \xb and \xc. 
> 
> http://www.mediawiki.org/w/index.php?title=Special:Code/MediaWiki/83998
> http://www.mediawiki.org/w/index.php?title=Special:Code/MediaWiki/83997
> 
> P.Copp -- you may want to backport that, or make note of a minimum PHP version
> to run.

Nice catch! That's what makes PHP so much fun to code in :)
Anyway, thanks for the heads up.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links