Last modified: 2011-04-14 15:13:16 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 12752 - space before/after »/« »guillemets« converted to  
space before/after »/« »guillemets« converted to  
Status: NEW
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
All All
: Low trivial with 2 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
  Show dependency treegraph
Reported: 2008-01-23 04:10 UTC by x00000000
Modified: 2011-04-14 15:13 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description x00000000 2008-01-23 04:10:52 UTC
A space before "»" (» - right-pointing double angle quotation mark) or a space after "«" (« - left-pointing double angle quotation mark) will be converted to a no-break space ( ).

This may be appropriate for most french text, but breaks line wrapping in languages where guillemets are used in the opposite order (»quote« instead of «quote» or « quote »). Compare .
Comment 1 Mormegil 2008-11-12 22:22:22 UTC
Agreed, e.g. the use of guillemets on the Czech Wikisource is quite problematical because of this. This should be applied only if the content language is French. Or, more generally – we should probably have per-language rules. See bug #13619.

See also bug #3158.
Comment 2 x00000000 2008-11-28 18:56:09 UTC
Workaround is to write something like text »quote« text.
MediaWiki doesn't recognize   as space at the point where it replaces them with  s.
Comment 3 Brion Vibber 2008-11-28 19:05:53 UTC
Sounds like checking for word breaks should do the job reasonably well here.


...quoted » outside
\s»\W -> break

outside »quoted...
\s»\w -> no break

As long as nobody uses this form:
outside » quoted...

in which case it would be much more difficult to distinguish which side the non-break space belongs on, requiring heuristics to try to see where the quote was started.
Comment 4 x00000000 2008-11-28 20:28:18 UTC
Would be better than now, assuming "break" means "nbsp" (i.e. "no break").

But it won't work for cases like "the sign »,« is a comma", citations starting/ending with an ellipsis or other punctuation (like »... text ...« or »[…] text!«) or Spanish-style »¿uh?« (but guillemets aren't common in Spanish).

And it doesn't work for most languages if the replacement operates on bytes instead of chars, like the code snippet in bug 13619 comment 3 suggests. The \w needs to match the appropriate Unicode classes.

BTW, I don't think these simple &nbsp; heuristics are useful at all. E.g., they cause code like <code>x = flag ? 0 : 1;</code> to be unusable after copying and break valid CSS like <span style="color : red ; background : yellow"/>.
Comment 5 x00000000 2008-11-28 23:24:34 UTC
This should fix most occurences in French without breaking much elsewhere:

  s/((?:[\s(]|^)«) /$1&nbsp;/
  s/ »(?=\.?\)|[.,]?(?:\s|<ref[\s>]|$))/&nbsp;»/

Should also work with raw UTF-8 bytes if « and » are written as \302\253 and \302\273.

BTW, the current code seems to have a bug:
  '/(.) (?=\\?|:|;|!|%|\\302\\273)/' => '\\1&nbsp;\\2'
should be either
  '/(.) (\\?|:|;|!|%|\\302\\273)/' => '\\1&nbsp;\\2'
  '/(.) (?=\\?|:|;|!|%|\\302\\273)/' => '\\1&nbsp;'
Comment 6 x00000000 2008-11-29 14:26:29 UTC
I missed the common cases ''« text »'' vs »''text''«, and <ref/>s seem to be already expanded at that stage (by looking at the code; I have no MediaWiki installation to test):

  s/((?:[\s(]|<[a-zA-Z]+>|^)«) /$1&nbsp;/
  s/ »(?=\.?\)|[.,]?(?:\s|<(?:\/|sup[\s>])|$))/&nbsp;»/

This handles also <blockquote>« citation »</blockquote> and similar (a line break isn't likely to occur at the beginning of a block element, but it makes a difference if text-align:justify (in Unicode compliant browsers)). It doesn't handle start tags with attributes like <span style="...">« text »</span> because that would be very expensive if done properly.

The better solution would be a configuration switch to apply these substitutions only for languages where they make sense. The only one of the current substitutions that makes some sense in most languages is s/ %/&nbsp;%/ (but it still destroys <code>x = y % z</code>).

Note You need to log in before you can comment on or make changes to this bug.