Last modified: 2009-08-25 14:34:34 UTC
Hi, I am a user from Chinese Wikipedia. I found there is a description "<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="zh" lang="zh" dir="ltr">" on the top of every page. As you know, Chinese has two charsets, Traditional and Simplified, and the default font of them are also different. Becasue all pages are specified to be "zh", and by default, "zh" means Simplified Chinese (zh-cn), so that users of Traditional Chinese (zh-tw & zh-hk) can not use their default font to display pages. To simply speaking, can the description "xml:lang="zh" lang="zh"" change with user's system default charset? Then for example, the description will be "xml:lang="zh-hk" lang="zh-hk"" if user is from Hong Kong or his system's charset is zh-hk.
Created attachment 1650 [details] Text rendered in zh Wikipedia in IE This is the texts that rendered in different language tags, using IE for Windows.
Created attachment 1651 [details] Text rendered in zh Wikipedia in Firefox This is the texts that rendered in different language tags, using Firefox for Windows.
This problem has been recently posted at March 2006 in wikitech-l http://mail.wikipedia.org/pipermail/wikitech-l/2006-March/034397.html seems no one replies on that issue, it's been suggested to resolve this issue here.
We could probably have it change the code based on the selected variant conversion. is this what you mean?
I think thatit can be done by similar tech that very on the value xxx which using [xml:lang="xxx" lang="xxx"] instead using the #wgContLangCode directly. this can be done by adding a piece of code at OutputPage.php, or including another file which handles the displaying language ccode. It is suggested the displaying language code is based on the both $wgContLangcode and a aeries of checking, by these steps: 1. Logged in users can be detected by the interface language, return a value according to the table below; 2.1. Anoym users can first detect by the HTTP_ACCEPT_LANGUAGE value, and a value according to the table below; 2.2. if step 2.1 failed, just return the #wgContLangCode value; Note: the return value that returned by the functions are varies by _both_ $wgContLangCode and the interface value by user, which: *If (($wgContLangCode == en) && (user interface language <- zh-tw)) => return en (#wgContLangCode) *If (($wgContLangCode == zh) && (user language language == zh-tw)) => return zh-tw The table (or array) below is the value that need returned by _both_ $wgContLangCode and interface language check (currently lists fot zh only): * $wgContLangCode == zh * if interface language == zh returns $wgContLangCode * if interface language == zh-cn returns zh-cn * if interface language == zh-tw returns zh-tw * if interface language == zh-hk returns zh-tw (for browser compatibility issue) * if interface language == zh-mo returns zh-tw (for browser compatibility issue) * if interface language == zh-sg returns zh-cn * but while $wgContLangCode != zh * if interface language == en returns $wgContLangCode * if interface language == de returns $wgContLangCode * if interface language == fr returns $wgContLangCode * if interface language == ja returns $wgContLangCode * if interface language == ko returns $wgContLangCode The table above will _not_ using this tech by detecting those two values. *If (($wgContLangCode == en) && (user interface language <- zh-tw)) => return en This one is intending _not_ to affect the display language code on other sites like in en, de, ft, ja, ko, ... wiki. This trick also applies the $wgContLangCode is not available in the browser: *If (($wgContLangCode == zh-min-nan) && (user interface language <- zh-min-nan)) => return en (for compatibility which the browser, including IE6/7 or Firefox does not support zh-min-nan tags).
The term [interface language] above means the language that currently used by the logged on user (i.e. the [user language]).
(In reply to comment #4) > We could probably have it change the code based on the selected > variant conversion. is this what you mean? Nope, the interface language and the variant conversion is different stuffs, and this issue is not releated with the variant conversion. This issue can be resolved base on the user interface language.
Created attachment 1688 [details] This is the draft version how to detect and change the value in the <html> tag. (Please note that some code cleanup is reqireed before commits) I've given a very draft version how to detect and change the <html> tag from various options, and some code cleanup is needed _before_ commits into the trunk since the code is not tested yet.
Created attachment 1691 [details] A bit cleanup for the prototype of the code
Created attachment 1695 [details] Further cleanup of the prototype code
Created attachment 1696 [details] A fine tuned function prototype This is the fine tuned function prototype, it works by calling the function. However it needs to be fine-tuned with conjunctive operations in the OutputPage.php.
Created attachment 1700 [details] This is the patch which enable the ability to set a assigned language code at the lang tags This is the patch that is use to correct the assosiate language tags with assigned font, and this patch needs a new file called "includes/LanguageTags.php" to work with this resolution. :)
Created attachment 1701 [details] A LanguageTags.php file used with this patch. This is the file that needs to run with the patch file.
Finally the patch is coming, I hope this patch is a workaround to address the Language and Font problem on various MediaWiki sites, It's not only designed for zh sites, other languages can also use this solution to address the Language and Font problem like als, ang, ast, bat-smg, simple, sr, etc.
As mentioned above we can't use something that relies on the Accept- Language header as it would break our caching system. Patch cannot be accepted.
Created attachment 1704 [details] A flow chart explaining how to determine the language code to be displayed Firstly, I think I need to send a flow chart to explain whether my concept is correct, then as per suggestions we got, write a code to resolving this problem. :)
Created attachment 1705 [details] modified patch file based on previous patch. Anyway, I uploaded a patch file on my previous patch to resolve the primary problem on state issue in some cases. (For example, using a zh-tw interface in a zh-yue site).
Created attachment 1706 [details] A updated LanguageTags.php file to make this code operating This is the updated LanguageTags.php file to make the new patch working.
(In reply to comment #15) > As mentioned above we can't use something that relies on the Accept- > Language header as it would break our caching system. Patch cannot > be accepted. The Accept-Language header is applicable when the browser supports that and enabled that, if this method fails, it would take the $wgContLanguageCode directly. But no idea why this would break the cache system......??? or is that my patched code is placed in the location that not suits in those files? :)
Accept-Language header check only applicable for anonymous users, it would take the $wgLanguageCode directly if above method fails. For logged-in users, it would take the interface language in user perferences to determining the Language Tag.
The patch seems to be trying to do something totally different from what's described in the summary, and by changing the output based on unsafe headers it would break caching. I'm marking this INVALID; please replace with a more directed issue.
I've bring this issue into the wikitech-l maillist for further discussion until this issue is resolved. Gname discussion direct link: http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/23533
As mentioned before, I've changed the summary title to suits the situation we're having. And also a non suitable patch != invalid bug report. Hence, I've REOPENed the bug again to resolving this issue. By the way, I've been conducting a survey to having the enquiry for the users in the local wiki (http://zh.wikipedia.org/wiki/User:Shinjiman/LanguageTags) to asking their user interface language and the language variant that they're using. Therefore it's seems impossible to solve this issue according the language variants. See also the mail at wikimedia-l for more detailed information regarding to this issue: http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/23542 and http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/23573
(In reply to comment #21) > The patch seems to be trying to do something totally different from what's described in the summary, and by changing the output based on unsafe headers it would break caching. I'm marking this INVALID; please replace with a more directed issue. Sorry, Brion, I don't understand why they are unsafe headers. Could you describe it more clearly? And could you advise a safe way to achieve our goal? Thank you.
Cache. Cache. Cache. And, cache. Shinjiman, your mail to wikitech-l makes even less sense. Please see my reply there.
As far as I have been able to tell: the lang attribute on <html> is set to the content language. Nothing in my testing indicates this isn't working 100% as intended. xml:lang attributes are unneeded in HTML5 anyway, which is what we're moving towards. Resolving INVALID.
How about the lang attribute in the HTML 5? I think the lang attribute is stilll needed in the HTML 5, according to http://dev.w3.org/html5/markup/common-attributes.html#common-attributes .
*** Bug 20387 has been marked as a duplicate of this bug. ***