Last modified: 2014-06-27 03:56:51 UTC
The bug can be easely reproduced. Just type in an article something like : one: un - two : deux ; three : trois ! (1) (mind the spaces!!!) Look at the HTML source code generated : you'll see something like : one: un - two : deux ; three : trois ! (2) which, in general, is fine. Observe that spaces before punctuation marks ( : ; ! ) have been replaced by the htmlEntity. What's the problem with that? If you want to make sure an image will not overlap from one section to another, you'll probably use the syntax : <br style="clear:both;" /> (3) Everything works fine... But if you write it as : <br style=" clear : both ; " /> (4) which is perfectly legal (and much readable, specially if you have a large style statement) you'll unfortunately generate : <br style=" clear : both ; " /> (5) and, of course, the style specification is invalid and will be ignored. The workaround is evident : use format (3) ...but it remains an annoying problem for those who ignore the bug.
Rephrasing the summary (and updating version and severity) – this is a general problem that non-breaking spaces inserted because of French typographic rules are inserted even into XHTML attributes, including style. There have been a similar bug report before (see bug #11874), but it has been “solved” by hardcoding that specific case of “!important”, not taking anything else into account. Note that the same thing happens if the semicolon is inserted using a parser function, see http://test.wikipedia.org/wiki/Nbsp_in_style#Broken_because_of_parser_function (and also bug #12974). See also bug #12752 for a more general objection to this feature.
See http://en.wikipedia.org/wiki/User:RockMFR/style-nbsp-bug for a variant of this that occurs when using spaces before template parameters.
*** Bug 19290 has been marked as a duplicate of this bug. ***
Worked like this for a while and the new parser is in the wings.
*** Bug 67092 has been marked as a duplicate of this bug. ***
(Cite Krinkle from 67092 comment #0) > When the parser strips a /* comment */ from a style attribute it inserts a > in its place. This causes the stylesheet to be invalidated by the > browser and the relevant styles are not applied when the page is renders. > > > Wikitext input: > > <blockquote style="border: 1px solid #aaa /* foo */;"></blockquote> > > Expected output: > > <blockquote style="border: 1px solid #aaa ;"></blockquote> > or > <blockquote style="border: 1px solid #aaa;"></blockquote> > > Actual output: > > <blockquote style="border: 1px solid #aaa  ;"></blockquote> > > A nbsp; is illegal in css in that position and results in a parse error by > the browser, causing the 'border' rule in this case to not be applied.
Change 142042 had a related patch set uploaded by Krinkle: [WIP] Parser: Don't insert inside style attributes https://gerrit.wikimedia.org/r/142042
It looks like CSS is just what makes this bug visible, rather than anything to do with its cause. At includes/parser/Parser.php:410-419: # Clean up special characters, only run once, next-to-last before doBlockLevels $fixtags = array( # french spaces, last one Guillemet-left # only if there is something before the space '/(.) (?=\\?|:|;|!|%|\\302\\273)/' => '\\1 ', # french spaces, Guillemet-right '/(\\302\\253) /' => '\\1 ', '/ (!\s*important)/' => ' \\1', # Beware of CSS magic word !important, bug #11874. ); $text = preg_replace( array_keys( $fixtags ), array_values( $fixtags ), $text ); This is doing a very aggressive replacement of spaces with   throughout the entire content of the page, and it's apparently already caused at least one other bug. I'm not sure why this is doing this, but our CSS handling is totally innocent and is just a red herring.