Last modified: 2010-05-15 16:03:21 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T18697, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 16697 - Unicode combining characters are difficult to edit in some browsers
Unicode combining characters are difficult to edit in some browsers
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Internationalization (Other open bugs)
1.13.x
All All
: Normal major with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-12-18 12:13 UTC by Gerard Meijssen
Modified: 2010-05-15 16:03 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Character misrepresented in edit mode on Meta (142.33 KB, image/jpeg)
2008-12-18 12:32 UTC, Gerard Meijssen
Details
Character properly represented once saved on Meta (132.89 KB, image/jpeg)
2008-12-18 12:33 UTC, Gerard Meijssen
Details
Lingala characters in edit mode using Chrome (122.72 KB, image/jpeg)
2008-12-18 23:47 UTC, Gerard Meijssen
Details

Description Gerard Meijssen 2008-12-18 12:13:40 UTC
People on the Lingala Wikipedia complain about a lack of support for characters like the ɔ́ that should show as one character and do not. They also do not show properly in bugzilla (certainly in edit mode). 

They will not collaborate in Betawiki as a consequence.. I posted on my blog about this, I posted on the Afrophone mailinglist and I got this additional comment from renaud gaudin:

"Also, I've been reporting for ages that the interface should use the
same font as text so that "buttons" are rendered properly.

On a French Windows with Internet Explorer (which is what's used in a
large part of west Africa), the "Edit" button (but also a significant
part of Interface texts) displays unknown (square) characters...

It makes it hard to convince people to first use Wikipedia, then to
contribute to it."

Thanks,
     GerardM
Comment 1 Siebrand Mazeland 2008-12-18 12:22:32 UTC
I think it would be great if we had some URLs here with examples, and a few screenshots with 'observed' and 'expected' behaviour.
Comment 2 Gerard Meijssen 2008-12-18 12:32:48 UTC
Created attachment 5590 [details]
Character misrepresented in edit mode on Meta
Comment 3 Gerard Meijssen 2008-12-18 12:33:46 UTC
Created attachment 5591 [details]
Character  properly represented once saved on Meta
Comment 4 Gerard Meijssen 2008-12-18 12:41:33 UTC
Comment on attachment 5591 [details]
Character  properly represented once saved on Meta

The behaviour is for me the same on the Lingala Wikipedia. This is in FF3.04. For other browsers the behaviour is said to be different. GM
Comment 5 Gerard Meijssen 2008-12-18 23:47:19 UTC
Created attachment 5597 [details]
Lingala characters in edit mode using Chrome

This is how the edit page looks like on the ln.wikipedia ... Chrome is clearly inferior to Firefox in supporting the special characters in edit mode. In final form it is ok.
Comment 6 Brion Vibber 2008-12-18 23:53:49 UTC
This "ɔ́" is a Unicode compound character, consisting of a base character ("ɔ") followed by a combining accent (" ́").

The bad news is that plenty of software is a little spotty about handling such characters cleanly. In this case, that means the browsers and the fonts.

Pasting "ɔ́" into Firefox 3 on my Mac seems to work fine. If it's not functioning in other current browsers, bug reports should be filed in the appropriate locations so it can be fixed for future versions. I'm not sure there's much else to be done on our end... working with the characters relies on them actually being supported by the browsers!
Comment 7 Gerard Meijssen 2008-12-19 00:10:00 UTC
(In reply to comment #6)

Thanks Brion, both FF and Chrome show "Mbɔ́tɛ!" properly in final form on the Wiki. http://meta.wikimedia.org/wiki/User:GerardM/Lingala IE does not. In edit mode, FF shows the diacritic separately while Chrome does not know how to handle it. 

So there is a difference between final form and edit.. Is there a difference in the font support indicated by MediaWiki ?
Thanks,
     GerardM
Comment 8 Brion Vibber 2008-12-19 00:24:42 UTC
This is entirely dependent on the text rendering and font support of the browser and the operating system it's running on.

Some quick tests on my boxes:

Safari 3 / Mac 10.5 -- edit good, page good
Safari 3 / Win XP -- edit good, page good

Firefox 2 / Ubuntu 7.10 -- edit good, page good
Firefox 3 / Mac 10.5 -- edit good, page good
Firefox 3 / Win XP -- edit and page both show base and comining character ok, but incorrectly spaced (not composited into a single visible glyph)

Chrome / Win XP -- edit and page both shown with correct composition but a big box instead of the base character

IE 7 / Win XP -- edit and page both show good base character followed by a totally unrelated character (looks like hebrew or something, not the expected acute accent at all!)
Comment 9 Andrew Cunningham 2008-12-19 08:16:00 UTC
Assuming 
* Windows XP SP2
* Assuming complex script support enabled
* Assuming OpenType fonts with appropraite mark positioning support
* Assuming that DEjaVu Sans isn't installed

There will still be a problem

The font fallback in the css rules is inappropriate and could be considered broken in this instance. the css rules should reflect langauge specific styling needs.

It isn't a browser issue.

It is:

1) an end user issue: fonts and font rendering support is needed; and
2) its a wikipedia issue, assuming the end user has things set up correctly, the css rules are inadequate

Andrew
Comment 10 Brion Vibber 2008-12-19 20:31:09 UTC
In what way are they inadequate, specifically, and what would you recommend as a change?

(Consider downloadable web fonts to be a potential option, though that brings difficulties with it.)
Comment 11 Andrew Cunningham 2008-12-20 06:04:01 UTC
For ln.wikipedia.org current css rules controlling font display would be 

#content, #bodyContent {
font-family:'DejaVu Sans','Segoe UI','Lucida Sans Unicode','Lucida Grande',Tahoma,'Arial Unicode MS','Lucida Sans',Verdana,sans-serif;
}

Taking each font in turn:

DejaVu Sans - OK for Lingala
Segoe UI - I'd need to test, should be ok, but Vista font
Lucinda Sans Unicode - cannot correctly render all Lingala characters, no mark, makmk OpenType features
Lucinda Grande - Mac OS font, don't know if this supports Lingala or not, would need to test.
Tahoma - version 3.0.6 (on WinXP) does not support Lingala, Version 5.0 may support Lingala, would need to test.
Arial Unicode MS - cannot correctly render all Lingala characters, no mark, makmk OpenType features
Lucinda Sans - support for lingala unknown
Verdana - version 3.0.6 (on WinXP) does not support Lingala, Version 5.0 may support Lingala, would need to test.

General rule of thumb for CSS font family fallback choose most appropriate non-core fonts first, then fall back to core OS fonts

So a rule like 

#content, #bodyContent {
font-family:'DejaVu Sans','Segoe UI','Lucida Grande',Tahoma,Verdana,sans-serif;
}

would be better

Although best would be to add other download able fonts suitable for African languages:

#content, #bodyContent {
font-family:'DejaVu Sans','Charis SIL','Gentium Book Basic','Liberation Sans','Doulos SIL','African Sans serif','African Sans','Segoe UI','Lucida Grande',Tahoma,Verdana,sans-serif;
}

depending on Tahoma and Verdana v. 5.0 support for Lingala, i'd be tempted to strip these from the CSS rules, may or maynot help Vista users, but could cause problems for users on older windows and Mac users who have an older version of MS Office installed. 

Segoe UI and Lucinda Grande. Would need to test these when i'm back in the Office on Monday.

Also I likes using monospaced fonts for textareas, and i find its sueful to explicity state font rules for the textarea element, so

#content, #bodyContent, textarea {
font-family:'DejaVu Sans','Charis SIL','Gentium Book Basic','Liberation Sans','Doulos SIL','African Sans serif','African Sans','Segoe UI','Lucida Grande',Tahoma,Verdana,sans-serif;
}

might work better.
Comment 12 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-12-20 23:40:51 UTC
(In reply to comment #11)
> For ln.wikipedia.org current css rules controlling font display would be 
> 
> #content, #bodyContent {
> font-family:'DejaVu Sans','Segoe UI','Lucida Sans Unicode','Lucida
> Grande',Tahoma,'Arial Unicode MS','Lucida Sans',Verdana,sans-serif;
> }

If those are wrong, bring it up at [[ln:MediaWiki talk:Monobook.css]], not here.  We (developers/sysadmins) don't have control over what CSS rules sysops choose to add for their own wikis.  The MediaWiki default is not to specify fonts at all for any language.
Comment 13 Gerard Meijssen 2008-12-20 23:54:08 UTC
A solution should not be confined to the ln.wikipedia. It should also work on Commons or Meta. When CSS rules are supposed to be language specific, then the support of the CSS should be based on what language is selected in the user preferences. 

MediaWiki is software that should work for any language. It does only need to specify fonts that work.
Thanks,
    GerardM
Comment 14 Tisza Gergő 2008-12-21 15:26:45 UTC
The right place for such a rule were probaly the default Lingala style sheet ([[betawiki:MediaWiki:Common.css/ln]]).

But this is partially a browser problem: browsers (except IE, of course) override the font set in the style sheet for characters which cannot be displayed in that font, and should do the same for combined characters. (At least in normal text; it's less obvious what would be the right thing to do in an edit box where displaying multiple characters as one has its own usability problems.)

A short-term solution might be a bot replacing the combination with a single character, for which there is better browser support.
Comment 15 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-12-21 15:41:34 UTC
(In reply to comment #14)
> The right place for such a rule were probaly the default Lingala style sheet
> ([[betawiki:MediaWiki:Common.css/ln]]).

Languages should *not* use *.css for default stylesheets (or *.js for default JS).  They are meant *only* for user customizations.  Any language-specific code put there will not be maintainable: changes made in MediaWiki will not stack with user customizations, and so will not take effect on upgrade.  If default fonts are necessary for some languages, these should be added through some separate, specially-designed mechanism.  The only language that uses its own CSS file right now is German, for bug 1553, and that's really not ideal (although it's a single rule that's not likely to change or become obsolete, much better than a list of fonts).

> A short-term solution might be a bot replacing the combination with a single
> character, for which there is better browser support.

This could be done by the software, if the character combinations are supposed to be canonically identical.  Actually, I thought Unicode normalization was supposed to do that anyway, but maybe I'm wrong on that.
Comment 16 Niklas Laxström 2008-12-21 15:43:36 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > A short-term solution might be a bot replacing the combination with a single
> > character, for which there is better browser support.
> 
> This could be done by the software, if the character combinations are supposed
> to be canonically identical.  Actually, I thought Unicode normalization was
> supposed to do that anyway, but maybe I'm wrong on that.
> 

There is no single character for ɔ́, afaik.
Comment 17 Andrew Cunningham 2008-12-21 23:49:42 UTC
(In reply to comment #12)
> The MediaWiki default is not to specify
> fonts at all for any language.
> 

Which is actually a good approach, stylesheets should be language neutral as much as possible.

But there are two scenarios for content:

1) all content in a single page is monolingual - in which case all is fine
2) content is predominately in one language, but contains words, phrases, quotes from other languages - this is a more problematic scenario. Since it would require different fonts to be used to display different languages. IN most the major languages that have full OS support current approach works fine. OS font-linking/switching and browser based approaches work fine. But for lesser used languages where there is no official OS support, things become more problematic, since different fonts may need to be specified for that language as distinct form the text of the surrounding page. 

The easiest and simplest approach is the use of language tagging in the markup and then users can tie their own css rules to the language markup.

Essentially the nature of Wikipedia content, means that the first fall back is OS and web browser fallback mechanisms, the second fall back is end user CSS overrides.

For monolingual content in a language specific wiki, its possible to have some sensible CSS rules in a language specific customisations to monobook.css

For content that includes words and phrases in other languages, the most sensible approach is language markup, this allows CSS rules to be created for wiki specific monobook.css or user specific CSS rules. 
Comment 18 Andrew Cunningham 2008-12-22 00:01:54 UTC
(In reply to comment #13)
> A solution should not be confined to the ln.wikipedia. It should also work on
> Commons or Meta. When CSS rules are supposed to be language specific, then the
> support of the CSS should be based on what language is selected in the user
> preferences. 
> 

But a user may have their preferences set to one language and may also work in other languages. So that approach will work in many cases but not in all.

> MediaWiki is software that should work for any language. It does only need to
> specify fonts that work.

It does work for any language. But for languages not supported officially by major OS vendors, things have always been more problematic, the advent of Unicode doesn't change that.

There are limitations to web browsers, operating system support, and even HTML and CSS specifications.

In an ideal world there would be comprehensive Latin script OpenType fonts available by default within an OS. But even is there are, web browsers and CSS provide no way to control which OpenType features are used for specific HTML documents. So its impossible to use a single Latin script font for all Latin script languages, even if it has language specific features and alternative glyphs for various languages. Browsers and CSS provide no way to access or control these features.

The best approach I've found for working with multiple scripts and languages (and some projects i've worked with up to 100 languages) is to have the main CSS rules be language neutral, tag primary language of a document, allow mechanisms for authors to indicate/markup up change of languages, allow language specific styling independent of the main styling for the theme/skin, and allow users to override/control aspects of the language specific styling.

Comment 19 Andrew Cunningham 2008-12-22 00:07:31 UTC
(In reply to comment #15)
> (In reply to comment #14)

> 
> This could be done by the software, if the character combinations are supposed
> to be canonically identical.  Actually, I thought Unicode normalization was
> supposed to do that anyway, but maybe I'm wrong on that.
> 

But there are many base character + combining character combinations in the Latin and Cyrillic scripts that do not and will never have precomposed forms

The character sequence open O + combining acute <U+0254 U+0301> is both the NFC and NFD form 
Comment 20 Gerard Meijssen 2008-12-25 12:23:14 UTC
MediaWiki experience with a change for Bug 1941 showed that a change FROM monospace removed the ability for Safari to work properly. It is likely that the change TO monospace for the edit screen for FireFox and IE will make these browsers work as well for these browsers.
Thanks,
    GerardM
Comment 21 Andrew Cunningham 2008-12-27 12:38:27 UTC
(In reply to comment #20)
> MediaWiki experience with a change for Bug 1941 showed that a change FROM
> monospace removed the ability for Safari to work properly. It is likely that
> the change TO monospace for the edit screen for FireFox and IE will make these
> browsers work as well for these browsers.
> Thanks,


Although Gerard, the languages you are interested in are unlikely to be well supported by monopaced fonts. Catch-22.
Comment 22 Gerard Meijssen 2008-12-27 13:41:31 UTC
Betawiki has a gadget that allows you to cycle from monospace, to sans, to serif. This shows that monospace breaks the usability of some of the languages we support. The only logical conclusion is to change from monospace to an other style of fonts. No catch-22 for me. If something is not usable, we use something else.

This WILL work for Firefox, Opera and Safari. Internet Explorer and Chrome are both currently broken; IE shows the wrong character Chrome shows no character.
Comment 23 Andrew Cunningham 2008-12-29 12:58:28 UTC
(In reply to comment #22)
> Betawiki has a gadget that allows you to cycle from monospace, to sans, to
> serif. This shows that monospace breaks the usability of some of the languages
> we support. The only logical conclusion is to change from monospace to an other
> style of fonts. No catch-22 for me. If something is not usable, we use
> something else.
> 
> This WILL work for Firefox, Opera and Safari. Internet Explorer and Chrome are
> both currently broken; IE shows the wrong character Chrome shows no character.
> 

For lesser used languages, this approach assumes:

1) End users have installed all language support available within the OS. Currently the only OS that I know of that installs all language support available by default is Windows Vista (maybe macOS too, don't know enough about MacOS to say). Windows XP and older versions of Widows as well as most, if not all Linux distros only install a minimal set of language support. Full language support has to be specified during the install or installed afterwards. 

2) End users have downloaded and installed appropriate fonts to cover all the languages covered by Betawiki. Since lesser used languages may not be supported

3) Generic font families assume that the end user has modified the default browser generic fonts per writing script to use appropriate fonts

4) That appropriate (monospaces, serif, sans-serif) fonts are available for the language in question
Comment 24 Gerard Meijssen 2008-12-29 13:45:39 UTC
That is all well and good. I do have all the appropriate fonts installed. It is monospaced for the edit window that fails me for most browsers. When problems are eliminated, there is at least a fighting chance of getting it right. 
Thanks, GerardM
Comment 25 Andrew Cunningham 2008-12-29 22:38:17 UTC
(In reply to comment #23)

> 
> 4) That appropriate (monospaces, serif, sans-serif) fonts are available for the
> language in question
> 

A clarification on my comment, with respect to monospaced fonts. It is important to note that there are few monospaced fonts (if any) that support various lesser used languages. And for many writing scripts monospaced fonts are inappropriate.

Looking at windows environment for instance, you'll find that for most scripts there isn't a monospaced font available. 

For various dubious reasons certain browsers default to monospaced fonts for displaying text in certain html elements. This is poor internationalisation. It doesn't scale in a truly multilingual environment.

Using generic font families can be a problem when you are developing or maintaining a truly multilingual environment. They are useful in themes and skins, so that the themes or skins can be made language neutral, but the themes or skins then need to be overlaid with language specific styling to ensure that all text has a chance of displaying.
Comment 26 Niklas Laxström 2009-06-19 11:49:31 UTC
To have something concrete I propose to add css override for languages with no good monospace font(s). The css would use serif or sans-serif font style for textareas. There should also be possibility to users switch back to monospaced.
Comment 27 Niklas Laxström 2009-07-28 15:38:48 UTC
Implemented solution I proposed above in r53874.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links