Last modified: 2014-11-17 10:35:38 UTC
I propose to include a feature that auto-detects the inferface language for anonymous users. This would be especially helpful for multilingual projects like the commons. The language can be set in the user interface, but one needs to understand the default language in order to even create an account, or find the right setting. The detection has three modes, controlled by $wgDetectLanguage: * LANG_USE_CONTENT: use the content language for anonymous users, i.e. dont use auto-detection. This is the default, and shows the same behaviour as without this patch. * LANG_PREFER_CONTENT: use the conten language if present in the Accept-Language list. Otherwise, behave like LANG_PREFER_CONTENT * LANG_PREFER_CLIENT: use the first language in the Accept-Language list that is supported by the wiki. Caveats: * the Accept-Language field is often not configured correctly in the browser. * the Accept-Language field would effect caching - the appropriate changes to the Vary: header are done automatically, but this reduces cache efficiency. * in order to decide which languages are supported, this relies on $wgContLang->getLanguageNames(). It does not actually check for the files to exist, as this would be pretty slow, and the detection is performed for every page request. * The languages in Accept-Languages are handeled as being given in the order of preference. Any weight-modifiers are ignored. patch to follow in a minute.
Created attachment 963 [details] patch against HEAD for Defines.php, DefaultSettings.php, OutputPage.php, and User.php
I worry this is too hostile to caching; without aggressive caching of anon views our entire infrastructure will collapse into a little pile of rubble. Adding a Vary on accept-language will splinter things a lot -- even where the same language gets selected there can be big variance in what's in the header.
I agree that this meight be a problem. It meight be quite useful as an option for smaller, non-Wikimedia projects, though. The only Wikimedia project I would suggest to enabled this for is the Commons. There, the main traffic is image data anyway, which can be cached independently of user language, and even for logged in users. May be worth a try. Also, it would be interresting to find out how many different Accept-Language headers we are actually seeing. Most people never change it by hand, and the default setup of the popular browsers doesn't vary too much, I guess. All this being said: one important point would be to provide a localized interface for creating an account to people that do not speak english. Maybe it would be enough to have a language selection on the login page, that would be used while creating an account. The default choice could be made based on Accept-Language. The language selected during account creation should then also become the language pre-set for the new account. But that would be a separate feature request, I guess.
I also think that the main data of Wikimedia Commons caching is not the text (web site) data but the images. At first Wikimedia Commons has never been prominently cited by the media (AFAIK) despite its central role (and its potential being a serious competitor to traditional stock image archives). Luckily all medias did look at Wikinews from day zero on. :p The second thing is that Commons has not an intuitive URL like Wikipedia: wikipedia.org vs. commons.wikimedia.org. Most "outsiders" (people not involved in any Wikimedia wiki) coming to Wikimedia Commons come there via Wikipedia image descriptions or Google images. Both ways are not used by the masses. So I am quite confident that this patch enabled for Wikimedia Commons only would help us a lot within several fields without harming our caching architecture: * People coming from local wikis and creating an account will now have the possibility getting (the main page and) account creating pages in their native language if you link special:userlogin in the local project (as long as single site login is not possible). This would help use reducing precudices against Wikimedia Commons a lot (sadly we have to live with these "english only" precudices against Commons currently, although we are working hard supporting many languages in a decent way). * Outside people that want to reuse Wikimedia Commons get information in their local language, which would help us a lot as people often ask about Wikimedia Commons conditions (and are apparently confused a little bit by the english only interface; despite the problem that Commons help pages weren't the best ones until recently). * We could reduce the English bias in Commons. Currently we have the problem that people are softly forced into English and thus do not realise that their local language is supported too (you know changing of preferences is not done by the masses even after login...). This leads to the problem that these non-English languages get neglected somewhat. A self strengthening effect supporting mainly English only... Many problems in Commons are caused by the lack of local language support. I personally do currently some work supporting help pages in several languages in a decent way but the more people you get from the beginnings the better is the result (and I also do only speak German, English and French)... So I think these patches would be a great thing for Wikimedia Commons.
The whole point is that if this great thing kills the site, it's not such a great thing, is it?
Well Rob could you give us some serious figures? I outlined quite detailed and with a rational analysis from my perspective why an negative impact on the servers is not going to happen if this patch is applied to Wikimedia Commons wiki *only*. For sure I could be wrong so in order to get a rational discussion of the issue we need the following figures: * How many image traffic is caused by anonymous page visits in Wikimedia Commons per month? * How many (page) text traffic is caused by anonymous page visits in Wikimedia Commons per month? * How many page visit traffic is caused per month by logged in users to Wikimedia Commons? I'd appreciate if you can afford the time extracting these numbers as I personally do invest quite some time as well providing you decent bug reports in order to reduce your amount of work needed to solve that bug reports.
I think the best thing for now would be to add a language selector the to account creation page, as described above. The selection could be pre-set based on browser preference, but that's not even necessary. The important thing is that people joining commons hsould not have to know english to create an account. If Commons is internationalized enough yet that it will be useful to them without a basic level of english is another question... I at least hope it will soon be usable without knowing english. So, shall I open a separate feature request for that?
(In reply to comment #7) > So, shall I open a separate feature request for that? An optional language selector would be cool.
Side note: this could be combined with bug 5638 to make multilingual projects like commons more useful to people not speaking english, even if not logged in. Perhaps using the browser's language setting is not a good idea - maybe it would be better to offer a drop down manu and a "set language" button that would set a cookie.
*** Bug 7761 has been marked as a duplicate of this bug. ***
*** Bug 26506 has been marked as a duplicate of this bug. ***
It's very dissapointing that nothing was improved anent this bug during six years. Many people in my surrounding really hate English and would never contribute to a project that appears totally in this language, without an easy and fast way to switch it. Maybe the browser language settings are sometimes incorrect. Nevertheless, the present default setting is incorrect almost always, displaying everything in English to everyone. Most users also don't know how to change the language settings in Meta or Commons: it is much easier to get to know own browser once than particular settings of every web visited. I suppose the browser setting can be set defaultly to English (often incorrect), or to the system or browser localisation language (probably nobody uses them in an unintelligible version, so there's no problem). All the possibilities are better than default English always.
(In reply to comment #12) > It's very dissapointing that nothing was improved anent this bug during six > years. > Although I agree with your argument that English always is not necessarily nice, there's a technical reason we haven't done anything in six years, presented in comment #2: Squid caching would suffer severely. The "pile of rubble" part may not be as accurate today as it was in 2005 (we gained some capacity since then), but please understand this is not an easy change at all. Back in 2005 our servers really did rely on every anonymous user seeing the same thing at the same URL for the servers not to melt down; and I'm not so sure Accept-Language detection for anonymous users would be feasible in 2010/2011 either. > Many people in my surrounding really hate English and would never contribute to > a project that appears totally in this language, without an easy and fast way > to switch it. > "an easy and fast way to switch it" might just be what we *can* do. We could use JavaScript to obtain the user's Accept-Language preferences from the API or something (which wouldn't go through Squid cache, but that's OK: it's just a language list, not an entire wiki page) and use that information to display a link with the native language name (i.e. 'Deutsch' for German, 'Français' for French, etc.) that would then lead to the account creation form in that language or maybe trigger persistent uselang (language selection for anonymous users, basically) if and when we have that. In fact, I once wrote some proof-of-concept code that obtained the user's Accept-Language settings from the API, stored it in a cookie (to avoid repetitive API requests) and used it to reorder the "In other languages" links in the sidebar. We never ended up using it but it's still lying around somewhere. tl;dr: Automatically showing wiki pages in the browser language for anonymous users is probably not gonna happen, but a feature offering to switch languages based on the browser language isn't hard to do.
If WMF cannot do it, it doesn't mean MediaWiki cannot do it. In fact the LanguageSelector extension does it already. In my opinion it would be nice to pick the automatic language detection code from it to core (disablable for WMF and other cached sites of course). Lets not mix two issues in this bug.
Well, should I create a new issue, requesting a language switcher for Commons (ideally accessible on every page, not only on the main one), that would trigger persistent uselang, so that the interface language could stay the same even after clicking links?
(In reply to comment #15) > Well, should I create a new issue, requesting a language switcher for Commons > (ideally accessible on every page, not only on the main one), that would > trigger persistent uselang, so that the interface language could stay the same > even after clicking links? There already is one, setting “persistent” uselang, used when coming from another Wikimedia project. Try going from e.g. http://cs.wikipedia.org/wiki/File:Example.jpg (not logged in) to the image page on Commons, you should get uselang=cs automatically. See http://commons.wikimedia.org/wiki/MediaWiki:PersistentUselang.js
It's nice, but we need a language switcher for not logged users, setting such a persistent uselang, on every page (or at least on the main one). Where should it be sorted out? Here or directly somewhere on Commons?
The switcher now exists but is not perfectly permanent, disappears always after searching a string in the search field.
(In reply to comment #18) > The switcher now exists but is not perfectly permanent, disappears always after > searching a string in the search field. Please report any bugs at: http://commons.wikimedia.org/wiki/MediaWiki_talk:AnonymousI18N.js The script can be seen at: http://commons.wikimedia.org/ (logged out) The source is at: http://commons.wikimedia.org/wiki/MediaWiki:AnonymousI18N.js
This has been done both from javascript in the front-end (see previous comment). And in the core/php (server side) in the following extensions: http://www.mediawiki.org/wiki/Extension:LanguageSelector Knowing that extension is in use on TranslateWiki and is doing pretty well I'd recommend closing this bug and directing further questions to either that extension or to a new bug (eg. "Fix bug X in Extension:LanguageSelector" or "Merge Extension:LanguageSelector in core (disableable)").
Well, I announced it as a new bug: https://bugzilla.wikimedia.org/show_bug.cgi?id=26876.
The switcher exists and its uselang is permanent at Commons, it is nice. Nonetheless I do not understand why couldn't Commons detect the browser default language and set the interface according to that for non-registered users as well, if the switcher hasn't been used. What's the problem? * Many users don't have set in their preferences in the browser – as far as I know, the default value is English there, so they will receive the Commons interface in English, the same way as now. * For users having it set, the interface would be in the preferred language. For nobody it would be worse, just better for one part of the users. Why not?
(In reply to comment #22) > The switcher exists and its uselang is permanent at Commons, it is nice. > > Nonetheless I do not understand why couldn't Commons detect the browser default > language and set the interface according to that for non-registered users as > well, if the switcher hasn't been used. What's the problem? > > * Many users don't have set in their preferences in the browser – as far as I > know, the default value is English there, so they will receive the Commons > interface in English, the same way as now. > > * For users having it set, the interface would be in the preferred language. > > For nobody it would be worse, just better for one part of the users. Why not? This could be done for logged-in users, I guess, but it definitely can't be done for anonymous users due to Squid caching. The browser language headers can't be detected client-side, only server-side.
Also, many people have their browsers languages misconfigured. Since those settings are hard to find (generally), its often very unclear to the user why they are getting language x vs language y. Any use of browser headers should have clear ways in the interface to change the auto-detected defaults. As for detecting language client side - you can always do an ajax like http://en.wikipedia.org/w/api.php?action=query&meta=userinfo&uiprop=acceptlang
Yes, they maybe have them misconfigured, but in fact it means "Not configured", default, in other words they have English there on the first place – so nothing would change for them, as now they have the interface in English as well. For users having the browser configured it would be better: Why to configure the browser language setting, if the webs neglect it? I really do not understand other thing now. Being not logged in, having the cache renewed and browsing anonymously with Mozilla, with Slovak in the language setting on the first place there, being in Portugal – in Commons there is a Czech notification "Wikimedia Commons is available in Czech". From where does the site take the language information? I thought it is the language setting, but obviously not, as now I prefere there Slovak, nonetheless nothing changed in Commons, it still offers Czech (but unfortunatelly just offers, it doesn't display the interface in that language).
It is unclear whether this bug is about having this feature in MediaWiki (exists in an extension) or in the Wikimedia projects (not done). Assuming the first since this bug is categorized as MediaWiki bug.
In my understanding, the bug is about having this feature in Wikimedia Commons.
(In reply to comment #25) > I really do not understand other thing now. Being not logged in, having the > cache renewed and browsing anonymously with Mozilla, with Slovak in the > language setting on the first place there, being in Portugal – in Commons there > is a Czech notification "Wikimedia Commons is available in Czech". From where > does the site take the language information? I thought it is the language > setting, but obviously not, as now I prefere there Slovak, nonetheless nothing > changed in Commons, it still offers Czech (but unfortunatelly just offers, it > doesn't display the interface in that language). Once again, see http://commons.wikimedia.org/wiki/MediaWiki:AnonymousI18N.js and its talk and discuss that script _there_. If you read that page, you would learn the user language should be selected using the following priorities: 1. Cookie (previous user preference) 2. According to the previous (referring) page (e.g. when you click on a Commons link on the Czech Wiktionary, you’ll get Commons in Czech) 3. Browser language 4. Fallback to the default language
The Indic language community is interested in this feature. I do not have time to summarize it and am not sure I would summarize adequately. The thread, for anyone who wants to read through it: http://lists.wikimedia.org/pipermail/wikimediaindia-l/2011-December/thread.html#5890 I've asked them to come here and detail what they want.
(In reply to comment #29) > The Indic language community is interested in this feature. I do not have time > to summarize it and am not sure I would summarize adequately. The thread, for > anyone who wants to read through it: > > http://lists.wikimedia.org/pipermail/wikimediaindia-l/2011-December/thread.html#5890 > > I've asked them to come here and detail what they want. My impression of the thread is they want a big site banner "View wikipedia in language X" with X being auto-detected via either geo-location or accept-language headers (aka your web browsers lang prefs). That isn't really this bug, otoh doing that is more likely to be implemented then this bug (since it can be done in pure js so low amount of caching issues, and most of the work is already done as we already can get accept-language headers from js ( http://www.mediawiki.org/w/api.php?action=query&meta=userinfo&uiprop=acceptlang ) and geo-location is also already set up for js as a side affect of geo targeted central notices.