Last modified: 2014-06-08 10:09:49 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T38430, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 36430 - Specify language fallback
Specify language fallback
Status: ASSIGNED
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
unspecified
All All
: Normal major with 5 votes (vote)
: ---
Assigned To: Wikidata bugs
u=dev c=story p=0
: i18n
: 37461 41495 43321 59151 60761 66333 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-02 12:29 UTC by denny vrandecic
Modified: 2014-06-08 10:09 UTC (History)
22 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description denny vrandecic 2012-05-02 12:29:12 UTC
How will language fall back work, i.e. in which order should languages be used when a different language does not exist? How should a label in a different language be displayed so it is visually clear that this label is in a specific language? How does this tie-in with Bug #36426 ?
Comment 1 abraham.taherivand 2012-06-28 09:37:01 UTC
How to handle the fact that we know the user languages from the preferences? How to tie this in in the fallback chain?

Consider facts like reality, i.e. the constraints through the parser cache for the display of items.
Comment 2 Nikola Smolenski 2012-06-28 11:28:36 UTC
Have in mind bug #37461 - in some cases conversion is needed; in some cases it is not needed to specify that a label is in a specific language.

Think of handling cases of multilingual content. For example, item about the question mark may well have "?" as the label. Even if some languages would want to use "question mark" or "Fragezeichen", in other "?" will probably be better than a foreign language. Another example, ".com". Perhaps MUL code could be used for this.

Also, there is more complex question of names. For example, "Berlin" means "Berlin" in a large number of languages, so it seems a waste not to use this fact. Perhaps a way to specify which language is the "default" language of the item, so other language could draw from it if possible. In some cases, additional conversion might be needed.
Comment 3 Nikola Smolenski 2012-06-28 11:31:47 UTC
Note also that some languages might have circular fallback, for example simple → en and en → simple.
Comment 4 jeblad 2012-07-07 22:47:34 UTC
For _global_ fallback chains the Language::getFallbacksFor( $langCode ) can be used, especially for the content. Likewise the _user_ can have a  have a defined fallback chain, and we can use this first and thenbuild on this to create a complete fallback chain.
Comment 5 Nikola Smolenski 2012-07-09 07:09:55 UTC
I am not sure that for label fallback we want to use the fallback that is used by the interface localization. The interface forms language hierarchy on the assumption that an interface message that is not defined in one language will always be defined in a parent language, or in English at last. Here we may well have the case that an item has a label in simple English but not in English.
Comment 6 jeblad 2012-07-11 08:59:00 UTC
There is a patchset that can be used for discussion at https://gerrit.wikimedia.org/r/#/c/15433/

This uses the current fallbacks from Language class (see $fallback in message files) if all languages fails. Typically the current fallback which goes right to english will be extended somewhat. For example Norwegian (bokmål) will use "nn,da,sv", Norwegian (nynorsk) will use "nb,sv,da", Swedish will use "nn,nb,da" and Danish will use "nb,nn,sv". All will have English added to the list by default.

If fallbacks are turned on, and if labels (or descriptions) are fallbacks, then they are flagged as such. That is a structure like the following are returned (the query is for labels in the "nb" language, but it is only found in "en")

http://localhost/repo/api.php?action=wbgetitems&ids=4&languages=nb&format=jsonfm
{
	"items": {
		"4": {
			"id": 4,
			"labels": {
				"en": {
					"language": "en",
					"value": "Etnedal",
					"fallback": ""
				}
			}
		}
	}
}


All languages specified in the call must fail for the fallbacks to be used. If they are used the languages list from the call is replaced one by one by the languages list from the fallbacks. If all fallback list fails then no labels (or descriptions) are returned.

It is only implemented for labels and descriptions in wbgetitems, fallbacks for modules that sets the labels and descriptions give no sense.
Comment 8 Nikola Smolenski 2012-07-17 14:45:31 UTC
It seems to me that there might be some confusion between two kinds of language fallback, so I'd like to clarify.

One kind is fallback between various variants of the same language like sr-el → sr or zh-hant → zh. These are the variants specified with 'variant' query parameter on Wikipedia, except that on Wikidata there may be additional fallbacks like simple → en. This should even be invisible to the user who doesn't have to know what variant is actually in the database.

It is entirely another kind of fallback when we simply don't have the content in user's language or any variant and are supplying another language that we assume the user knows. In this case, language of the fallback should be visibly displayed to the user.
Comment 9 jeblad 2012-07-17 19:48:10 UTC
Whats implemented is global fallbacks initiated by the user language, because that can be cached as it creates a unique page wherever it is used for that specific user language.

There are two types of fallbacks, like you said, one for similar languages and one for forms in other writhing systems. The first one needs handling now, the later form is not so urgent. The later also builds on the first as we need to get the correct Language object to be able to create the variants. The later is also not prioritized for the moment.

In an ideal world with complete code we should try to find the global languages for a label, then limit them to the users chosen languages, then figure out which language transform we should use. If that fails we use the users language as a starting point for a fallback chain and use that one to find a label, then limit that to the users chosen languages, and then figure out which language transform to use. If everything fails we could then try all languages in the users preference list, and then if they all fails, we could fall back to using the item identifier itself.

The problem with that is basically workload, not only to generate the page but it will be a user specific page. As we now have more or less decided to turn of caching the last point is really not an issue. 

For now the users own language is used as the starting point of the fallback chain in RecentChanges (and other places), but that could be changed to the users preferred languages. It is not clear how the preferred languages can be turned into a unique ordered list. In the wbgetitems API-call the supplied languages are tried first, and then the fallbacks are tried if the flag "fallback" is set. By setting the language with "uselang" other starting points than English can be used.
Comment 10 denny vrandecic 2012-08-14 14:27:43 UTC
See also http://meta.wikimedia.org/wiki/Wikidata/Notes/Language_fallback
Comment 11 Sam Reed (reedy) 2012-11-03 15:02:27 UTC
Note, lack of fallbacks also makes a problems with mainpages in en-gb and pt-br etc.

See https://www.wikidata.org/w/index.php?title=User_talk:Reedy&oldid=310868h
Comment 12 jeblad 2012-11-03 15:37:45 UTC
We also need a way to set up project specific language fallbacks, as our fallback chains may not be what other projects would prefer.
Comment 14 Helder 2012-11-03 23:33:40 UTC
(In reply to comment #3)
> Note also that some languages might have circular fallback, for example simple
> → en and en → simple.

Also: pt → pt-br → pt

(In reply to comment #5)
> Here we may well
> have the case that an item has a label in simple English but not in English.

On MW interface we can also have messages translated in pt but not in pt-br (and then the pt should be used) or translated in pt-br but not in pt (and pt-br should used).

(In reply to comment #9)
> chain in RecentChanges (and other places), but that could be changed to the
> users preferred languages. It is not clear how the preferred languages can be
> turned into a unique ordered list. In the wbgetitems API-call the supplied

By "preferred languages" do you mean the ones defined at in the "translate-pref-editassistlang" field
https://www.wikidata.org/wiki/Special:Preferences?uselang=qqx#mw-prefsection-editing
? (that comes from [[mw:Extension:Translate]] IIRC)
Comment 15 jeblad 2012-11-04 02:25:56 UTC
Yes we know about circular references.
There are a number of examples given, and they are for variants of english but could equally well be for a number of other languages.
There is a set of preferred languages that isn't included in Phase I. Basically it is a list of languages the user has flagged a special interest in, so they are made visible or used as labels and so forth.
Comment 16 Lydia Pintscher 2012-12-21 21:57:05 UTC
*** Bug 43321 has been marked as a duplicate of this bug. ***
Comment 17 Nikola Smolenski 2013-03-05 07:10:09 UTC
Bump.

When real Wikidata started, my default interface was Serbian but I had to switch it to English because Wikidata in Serbian was unreadable and unusable. Everywhere you go, you see Q1234567 labels that are meaningless to you. Worse: you could see an entire item filled with statements like "P21 of this item is Q44148" that are even more meaningless.

I suggest that language fallback is temporarily implemented in simplified form: display the label in the user interface language. If it doesn't exist, display English label in angular brackets. If that doesn't exist, display nothing. This should work in 99.99% of cases, Wikidata will be more understandable to non-English readers, it will be easier for editors to enter the non-English labels and more people will use non-English interface languages. Full language fallback could be implemented later.
Comment 18 Quim Gil 2013-04-23 21:12:03 UTC
Just a note to say that Liangent has applied to GSoC with a proposal related to this report. Good luck!

https://www.mediawiki.org/wiki/User:Liangent/wb-lang
Comment 19 Matthew Flaschen 2013-05-29 04:54:09 UTC
Liangent was accepted.  Congratulations!

Also, there is an RFC relevant to this at https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Labels_and_descriptions_in_language_variants
Comment 21 Quim Gil 2013-09-17 16:16:33 UTC
GSoC "soft pencils down" date was yesterday and all coding must stop on 23 September. Has this project been completed?
Comment 22 Liangent 2013-09-17 16:38:28 UTC
(In reply to comment #21)
> GSoC "soft pencils down" date was yesterday and all coding must stop on 23
> September. Has this project been completed?

Server-side changes (PHP) changes are almost done now. I stopped creating new pieces a little bit earlier to focus on amending existing code and getting them merged in these days. All client side stuff (JavaScript) are still TODO.
Comment 23 Lydia Pintscher 2013-10-08 16:46:20 UTC
*** Bug 41495 has been marked as a duplicate of this bug. ***
Comment 24 Quim Gil 2013-10-22 19:36:33 UTC
If you have open tasks or bugs left, one possibility is to list them at https://www.mediawiki.org/wiki/Google_Code-In and volunteer yourself as mentor.

We have heard from Google and free software projects participating in Code-in that students participating in this programs have done a great work finishing and polishing GSoC projects, many times mentores by the former GSoC student. The key is to be able to split the pending work in little tasks.

More information in the wiki page. If you have questions you can ask there or you can contact me directly.
Comment 25 Lydia Pintscher 2014-01-09 16:53:53 UTC
*** Bug 59151 has been marked as a duplicate of this bug. ***
Comment 26 Lydia Pintscher 2014-02-03 14:25:07 UTC
*** Bug 60761 has been marked as a duplicate of this bug. ***
Comment 27 Lydia Pintscher 2014-03-17 11:16:43 UTC
*** Bug 37461 has been marked as a duplicate of this bug. ***
Comment 28 Lydia Pintscher 2014-06-08 10:09:49 UTC
*** Bug 66333 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links