Last modified: 2014-11-17 09:45:51 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T43807, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 41807 - Support for pseudo-language "mul" to indicate multilingual content
Support for pseudo-language "mul" to indicate multilingual content
Status: NEW
Product: MediaWiki
Classification: Unclassified
Internationalization (Other open bugs)
1.21.x
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-06 08:08 UTC by Daniel Kinzler
Modified: 2014-11-17 09:45 UTC (History)
16 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Daniel Kinzler 2012-11-06 08:08:10 UTC
MediaWiki should support the language code "mul" to indicate multilingual content. This could be used as the content language on multilingual wikis, to avoid using a misleading "true" language code (as meta.wikimedia.org and commons.wikimedia.org currently do: they pretend to be in english, while they are not). This could also be used to indicate the language of truly multilingual pages, like item pages on wikidata.org.

The Language object for "mul" could be derived from the Language object for english - so it would inherit english messages, english date formatting, etc. I.e. sites that start to use "mul" as their content language instead of "en" would keep working exactly as before, except that they no longer lie about their content language.

Support for "mul" would also come in handy for language fallback mechanism: "mul" could act as the default fallback fro anything (instead of "en"). This is especially interesting for the fallback mechanism in wikibase, which (unlike mediawiki's i18n system) can not assume that english language messages exist.
Comment 1 Daniel Kinzler 2012-11-06 08:33:00 UTC
Oh, I forgot to mention: "mul" is defined by ISO 639-2. It's in the standard, not an ad-hoc custom solution. "und" for "undetermined" is also defined and should perhaps be supported by MediaWiki.
Comment 2 Nemo 2012-11-06 08:46:52 UTC
(In reply to comment #1)
> Oh, I forgot to mention: "mul" is defined by ISO 639-2. It's in the standard,
> not an ad-hoc custom solution. "und" for "undetermined" is also defined and
> should perhaps be supported by MediaWiki.

Yes, this is already acknowledged in the bug I linked (which is different because it asks only a [fake] interlanguage prefix); however, see bug 32189 comment 25 for some implementation problems.
Comment 3 Niklas Laxström 2012-11-06 09:09:30 UTC
Using mul is not a replacement for tagging the content correctly. If there are things in multiple languages, tag each of them with correct language code. If the problem is only that we don't know the language right now, mul is not a correct code.

The templates and commons and the page translation system are tagging the language correctly, and I think this covers big portion of the non-English content in those wikis. Marking the rest with code mul would actually be a huge regression, since I suppose majority of the remaining content is actually in English.
Comment 4 Daniel Kinzler 2012-11-06 09:21:44 UTC
(In reply to comment #3)
> Using mul is not a replacement for tagging the content correctly. 

Yes, and that's not what I'm suggesting.

Ok, here's the use case. Consider a wikidata page with labels in 20 languages. Each label will be tagged in the HTML code with the correct language and directionality attributes. However:

* What do we put into the HTTP Content-Language header? I think "mul" would be correct there. Similarly, when supplying DC meta data about a page (as the OAI extension does), there is only one language code that can be provided, so "mul" would be the correct choice.

* More urgently, we need a code that we can use as a general fallback - Many things (especially people and places, i.e. over 50% of the content of wikipedia and thus wikidata) have "native" versions of their name that should act as a fallback for all other languages - and that name is indeed the correct one for *multiple* languages. I doubt that there are renderings for the town "Rackwitz" in languages other than German. Having to set this string redundantly for 300 languages makes no sense to me. So, setting "Rackwitz" as the value for the "mul" code, and falling back on this, makes sense. Using "en" for this purpose would be grossly misleading, especially in cases where the "native" form is not using latin characters (say, Руза).

* Also, what language to we announce for the entire wiki? The API, site matrix, etc can tell you which wiki has which content language. Using "en" for multilingual wikis is annoying. I see no reason not to get rid of that lie.
Comment 5 Niklas Laxström 2012-11-06 09:50:53 UTC
1) Do we need to use that header at all? I've never seen it used. Could be the interface language code.

2) That seems like internal design leaking to the interface. Can't you just make it possible to designate one language to be the default in the interface? The way you store that information internally can be language code mul, but doesn't need to be.

3) That is valid point, but that could be implemented via a new config option $wgMultilingualWiki, which at first stage would change the language code in API and other places to mul.
Comment 6 Nikola Smolenski 2012-11-07 07:32:16 UTC
Relevant Wikidata discussion: https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team#Special_language_code_.22mul.22.2FTranslingual

I believe this bug is highly related to bug 36430, if not a duplicate of it. mul is just the top of the language fallback pyramid.

(In reply to comment #4)

> Ok, here's the use case. Consider a wikidata page with labels in 20 languages.
> Each label will be tagged in the HTML code with the correct language and
> directionality attributes. However:
> 
> * What do we put into the HTTP Content-Language header? I think "mul" would be
> correct there. Similarly, when supplying DC meta data about a page (as the OAI
> extension does), there is only one language code that can be provided, so "mul"
> would be the correct choice.

Interface language, which is also the language of the title of the page. This is a list of words in many languages, but the list itself is in one language however little linguistic content it has.

> * More urgently, we need a code that we can use as a general fallback - Many
> things (especially people and places, i.e. over 50% of the content of wikipedia
> and thus wikidata) have "native" versions of their name that should act as a
> fallback for all other languages - and that name is indeed the correct one for
> *multiple* languages. I doubt that there are renderings for the town "Rackwitz"
> in languages other than German. Having to set this string redundantly for 300

You lost your bet: Serbian rendering is Раквиц in Cyrillic or Rakvic in Latin. Languages that use Latin alphabet generally just reuse the name; languages that use other alphabets generally transliterate or transcribe the name; Chinese have to even translate it.

You owe me one beer at the Bavaria pub at the destroyed church; John will tell you why is this bad for you :)

> languages makes no sense to me. So, setting "Rackwitz" as the value for the
> "mul" code, and falling back on this, makes sense. Using "en" for this purpose
> would be grossly misleading, especially in cases where the "native" form is not
> using latin characters (say, Руза).

But I fully agree with this, and I would even go one step further: use the code mul-de (multilingual content of German origin). This would allow us to in some cases automatically convert and display the names in languages that use different alphabets.
Comment 7 Daniel Kinzler 2012-11-07 09:12:09 UTC
(In reply to comment #5)
> 1) Do we need to use that header at all? I've never seen it used. Could be the
> interface language code.

It's good practice to send it. But I guess you are right - for a multilingual site, it could and perhaps should be the interface code. I think currently, MediaWiki always sends the content language.

> 2) That seems like internal design leaking to the interface. Can't you just
> make it possible to designate one language to be the default in the interface?
> The way you store that information internally can be language code mul, but
> doesn't need to be.

We can't specify that globally, and asking users to specify it for every item is likely to cause a mess. 

But I don't understand what you mean by "leaking into the interface." "mul" would be handled similarly to "qqx" - there's no leaking there, right?

But I do think we need to be able to create a Language object for mul. For instance, Title::getPageLanguage() and Content::getPageContentLanguage() return a Language object. I do want to return "mul" for that for Wikidata - because the page content *is* multilingual.

> 3) That is valid point, but that could be implemented via a new config option
> $wgMultilingualWiki, which at first stage would change the language code in API
> and other places to mul.

But to do that, I again meed to be able to construct a Language object for mul, am I not?

(In reply to comment #6)
> > * What do we put into the HTTP Content-Language header? I think "mul" would > 
> Interface language, which is also the language of the title of the page. 

Makes sense for Wikidata. Probably not for Wikipedia. If you browse the German language Wikipedia with an English interface, would you consider the content language to be English?

> > I doubt that there are renderings for the town "Rackwitz"
> > in languages other than German. Having to set this string redundantly for 300
> 
> You lost your bet: Serbian rendering is Раквиц in Cyrillic or Rakvic in Latin.

Duh! Wikipedia never ceases to amaze me. 19 Languages! For a town that doesn't even have a gas station or a hair dresser!

Anyway. There's quite a few places that we don't have translations or transliterations for.

> You owe me one beer at the Bavaria pub at the destroyed church; John will tell
> you why is this bad for you :)

I owe you a beer, but let me pick the pub :)

> But I fully agree with this, and I would even go one step further: use the code
> mul-de (multilingual content of German origin). This would allow us to in some
> cases automatically convert and display the names in languages that use
> different alphabets.

Oh, nice idea!
Comment 8 Nikola Smolenski 2012-11-07 10:43:06 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > > * What do we put into the HTTP Content-Language header? I think "mul" would > 
> > Interface language, which is also the language of the title of the page. 
> 
> Makes sense for Wikidata. Probably not for Wikipedia. If you browse the German
> language Wikipedia with an English interface, would you consider the content
> language to be English?

Of course, German.

Offtopic, for http://commons.wikimedia.org/wiki/Hauptseite I would expect it to be German as well. Perhaps a parser function that can change that?

> > > I doubt that there are renderings for the town "Rackwitz"
> > > in languages other than German. Having to set this string redundantly for 300
> > 
> > You lost your bet: Serbian rendering is Раквиц in Cyrillic or Rakvic in Latin.
> 
> Duh! Wikipedia never ceases to amaze me. 19 Languages! For a town that doesn't
> even have a gas station or a hair dresser!
> 
> Anyway. There's quite a few places that we don't have translations or
> transliterations for.

But in quite a few cases we can automatically convert them. A German-Serbian transliterator would probably be able to cover 90% of German cities correctly, including Rackwitz. Serbian-Macedonian or Japanese-Serbian transliterator could work perfectly.

> > You owe me one beer at the Bavaria pub at the destroyed church; John will tell
> > you why is this bad for you :)
> 
> I owe you a beer, but let me pick the pub :)

C-Base it is, then ;)

> > But I fully agree with this, and I would even go one step further: use the code
> > mul-de (multilingual content of German origin). This would allow us to in some
> > cases automatically convert and display the names in languages that use
> > different alphabets.
> 
> Oh, nice idea!

Yes, it is also required for the transliterator to work.
Comment 9 Nikola Smolenski 2012-11-07 10:43:48 UTC
(In reply to comment #8)
> But in quite a few cases we can automatically convert them. A German-Serbian
> transliterator would probably be able to cover 90% of German cities correctly,
> including Rackwitz. Serbian-Macedonian or Japanese-Serbian transliterator could
> work perfectly.

Offtopic: I would also like to do this for usernames.
Comment 10 Andre Klapper 2013-10-31 12:17:04 UTC
[replacing wikidata keyword by adding CC - see bug 56417]
Comment 11 Seb35 2013-12-31 17:30:47 UTC
I would love to see such a thing as a "multilingual wiki", like Meta or Commons or Wikidata are. But I guess there are so many things to do to specify correctly what is really a multilingual wiki that one or many RFC should be drafted.

(The Translate extension helps a bit to improve the current situation on Meta and Commons by tagging correctly more pages.)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links