Last modified: 2007-04-30 21:02:50 UTC
When editing a page using Internet Explorer 6, Mediawiki says "Your browser is not unicode-compliant, blahblah". I did some debugging and found: In the file includes\DefaultSettins.php, several REGEXP patterns are defined on non-compliant browsers: $wgBrowserBlackList = array( /** * Netscape 2-4 detection * The minor version may contain strings such as "Gold" or "SGoldC-SGI" * Lots of non-netscape user agents have "compatible", so it's useful to check for that * with a negative assertion. The [UIN] identifier specifies the level of security * in a Netscape/Mozilla browser, checking for it rules out a number of fakers. * The language string is unreliable, it is missing on NS4 Mac. * * Reference: http://www.psychedelix.com/agents/index.shtml */ '/^Mozilla\/2\.[^ ]+ .*?\((?!compatible).*; [UIN]/', '/^Mozilla\/3\.[^ ]+ .*?\((?!compatible).*; [UIN]/', '/^Mozilla\/4\.[^ ]+ .*?\((?!compatible).*; [UIN]/', #NOTE THIS!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! /** * MSIE on Mac OS 9 is teh sux0r, converts þ to <thorn>, ð to <eth>, Þ to <THORN> and Ð to <ETH> * * Known useragents: * - Mozilla/4.0 (compatible; MSIE 5.0; Mac_PowerPC) * - Mozilla/4.0 (compatible; MSIE 5.15; Mac_PowerPC) * - Mozilla/4.0 (compatible; MSIE 5.23; Mac_PowerPC) * - [...] * * @link http://en.wikipedia.org/w/index. ... &oldid=12355864 * @link http://en.wikipedia.org/wiki/Template%3AOS9 */ '/^Mozilla\/4\.0 \(compatible; MSIE \d+\.\d+; Mac_PowerPC\)/' ); And in the file includes\EditPage.php, the current browser's USER-AGENT string is checked against the patterns: function checkUnicodeCompliantBrowser() { global $wgBrowserBlackList; if( empty( $_SERVER["HTTP_USER_AGENT"] ) ) { // No User-Agent header sent? Trust it by default... return true; } $currentbrowser = $_SERVER["HTTP_USER_AGENT"]; foreach ( $wgBrowserBlackList as $browser ) { if ( preg_match($browser, $currentbrowser) ) { return false; } } return true; } Note the 3rd pattern, '/^Mozilla\/4\.[^ ]+ .*?\((?!compatible).*; [UIN]/', it will match the IE6's USER-AGENT string on my machine, whick is shown below: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; iebar; .NET CLR 1.1.4322; InfoPath.1) Please note the "InfoPath.1" part near the end of the string, it is there because I installed InfoPath, a component of Microsoft Office 2003. The starting letter 'I' makes it matched with the 3rd pattern.
IE 6.0 hasn't ever triggered this in the wild that we know of. Have you done something strange to customize your user-agent string? Can you confirm that it works properly when restored to normal?
Ahh I see
After installing .NET framework 1.1 and MS Office 2003 (including InfoPath component), my IE's user-agent string has been changed like this: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; iebar; .NET CLR 1.1.4322; InfoPath.1) InfoPath is a standard component of MS Office 2003 Pro, not a WEIRD PLUGIN.
This regexp array doesn't recognized IE7 with this $USER_AGENT: 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon; MRA 4.8 (build 01705); InfoPath.1; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)' and MediaWiki show this message: ''WARNING: Your browser is not unicode compliant. A workaround is in place to allow you to safely edit articles: non-ASCII characters will appear in the edit box as hexadecimal codes.'' but this browser support UTF8!
It seems like the intent is to allow browsers that don't have "compatible" there, but the regex doesn't work. Should probably be - '/^Mozilla\/2\.[^ ]+ .*?\((?!compatible).*; [UIN]/', - '/^Mozilla\/3\.[^ ]+ .*?\((?!compatible).*; [UIN]/', - '/^Mozilla\/4\.[^ ]+ .*?\((?!compatible).*; [UIN]/', + '/^Mozilla\/2\.[^ ]+ [^(]*\((?!compatible).*; [UIN]/', + '/^Mozilla\/3\.[^ ]+ [^(]*\((?!compatible).*; [UIN]/', + '/^Mozilla\/4\.[^ ]+ [^(]*\((?!compatible).*; [UIN]/', The .*? is screwed up by the nested parentheses (it eats the initial parenthesis to avoid the prohibited "compatible" string). Perl regex is all very nice, but POSIX-style is better here. Patch needs review.
Tested the above modifications against actual referrer strings in our logs to confirm. Of 43003 MSIE samples, 3709 listed the InfoPath extension. 93 MSIE hits were false-positive matches for the regexes, of which 77 listed the InfoPath extension. In total, less than 0.12% of sampled hits were false positive matches -- 0.22% of MSIE hits, 2.08% of InfoPath hits. (I did not sample edits specifically, but all hits.) Fixed in r21726.