Last modified: 2014-06-27 03:56:51 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 3158 - Parser inserts invalid   in the middle of style attribute
Parser inserts invalid   in the middle of style attribute
Status: PATCH_TO_REVIEW
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: Low minor with 4 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://test.wikipedia.org/wiki/Nbsp_i...
: newparser
: 19290 67092 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-08-15 17:44 UTC by Marc Meurrens (be) http://www.meurrens.org/
Modified: 2014-06-27 03:56 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Marc Meurrens (be) http://www.meurrens.org/ 2005-08-15 17:44:04 UTC
The bug can be easely reproduced.

Just type in an article something like :
one: un - two : deux ; three  : trois ! (1)
(mind the spaces!!!)

Look at the HTML source code generated : you'll see something like :
one: un - two : deux ; three  : trois ! (2)
which, in general, is fine.
Observe that spaces before punctuation marks ( : ; ! ) 
have been replaced by the htmlEntity.

What's the problem with that?

If you want to make sure an image will not overlap from one section to another,
you'll probably use the syntax :
<br style="clear:both;" /> (3)
Everything works fine...

But if you write it as :
<br style=" clear : both ; " /> (4)
which is perfectly legal (and much readable, specially if you have a large style
statement)
you'll unfortunately generate :
<br style=" clear &nbsp;: both &nbsp;; " /> (5)
and, of course, the style specification is invalid
and will be ignored.

The workaround is evident : use format (3) 
...but it remains an annoying problem
for those who ignore the bug.
Comment 1 Mormegil 2008-11-12 22:17:03 UTC
Rephrasing the summary (and updating version and severity) – this is a general problem that non-breaking spaces inserted because of French typographic rules are inserted even into XHTML attributes, including style. There have been a similar bug report before (see bug #11874), but it has been “solved” by hardcoding that specific case of “!important”, not taking anything else into account.

Note that the same thing happens if the semicolon is inserted using a parser function, see http://test.wikipedia.org/wiki/Nbsp_in_style#Broken_because_of_parser_function (and also bug #12974).

See also bug #12752 for a more general objection to this feature.
Comment 2 RockMFR 2009-03-18 22:12:21 UTC
See http://en.wikipedia.org/wiki/User:RockMFR/style-nbsp-bug for a variant of this that occurs when using spaces before template parameters.
Comment 3 Brion Vibber 2009-06-23 01:49:06 UTC
*** Bug 19290 has been marked as a duplicate of this bug. ***
Comment 4 Mark A. Hershberger 2011-04-12 15:54:45 UTC
Worked like this for a while and the new parser is in the wings.
Comment 5 Bartosz Dziewoński 2014-06-25 20:41:10 UTC
*** Bug 67092 has been marked as a duplicate of this bug. ***
Comment 6 Krinkle 2014-06-26 21:03:44 UTC
(Cite Krinkle from 67092 comment #0)
> When the parser strips a /* comment */ from a style attribute it inserts a
> &nbsp; in its place. This causes the stylesheet to be invalidated by the
> browser and the relevant styles are not applied when the page is renders.
> 
> 
> Wikitext input:
> 
> <blockquote style="border: 1px solid #aaa /* foo */;"></blockquote>
> 
> Expected output:
> 
> <blockquote style="border: 1px solid #aaa ;"></blockquote>
> or
> <blockquote style="border: 1px solid #aaa;"></blockquote>
> 
> Actual output:
> 
> <blockquote style="border: 1px solid #aaa &#160;;"></blockquote>
> 
> A nbsp; is illegal in css in that position and results in a parse error by
> the browser, causing the 'border' rule in this case to not be applied.
Comment 7 Gerrit Notification Bot 2014-06-26 21:04:50 UTC
Change 142042 had a related patch set uploaded by Krinkle:
[WIP] Parser: Don't insert &nbsp; inside style attributes

https://gerrit.wikimedia.org/r/142042
Comment 8 Jackmcbarn 2014-06-27 03:56:12 UTC
It looks like CSS is just what makes this bug visible, rather than anything to do with its cause. At includes/parser/Parser.php:410-419:

		# Clean up special characters, only run once, next-to-last before doBlockLevels
		$fixtags = array(
			# french spaces, last one Guillemet-left
			# only if there is something before the space
			'/(.) (?=\\?|:|;|!|%|\\302\\273)/' => '\\1&#160;',
			# french spaces, Guillemet-right
			'/(\\302\\253) /' => '\\1&#160;',
			'/&#160;(!\s*important)/' => ' \\1', # Beware of CSS magic word !important, bug #11874.
		);
		$text = preg_replace( array_keys( $fixtags ), array_values( $fixtags ), $text );

This is doing a very aggressive replacement of spaces with &#160; throughout the entire content of the page, and it's apparently already caused at least one other bug. I'm not sure why this is doing this, but our CSS handling is totally innocent and is just a red herring.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links