Last modified: 2014-03-09 18:02:49 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T44396, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 42396 - duplicate/invalid language codes
duplicate/invalid language codes
Status: VERIFIED FIXED
Product: Wikimedia
Classification: Unclassified
Wikidata (Other open bugs)
wmf-deployment
All All
: Normal normal with 1 vote (vote)
: ---
Assigned To: Wikidata bugs
https://meta.wikimedia.org/wiki/Speci...
: i18n
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-23 19:55 UTC by merl
Modified: 2014-03-09 18:02 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description merl 2012-11-23 19:55:00 UTC
Archived discussion at http://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Archive/2012/11#language_code_problems

als, be-x-old and zh-yue are not a valid language code that should be used for labels, aliases and descriptions. Instead gsw, be-tarask and yue are the valid lang codes.

You you also check the lang setting of these wikis: http://als.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=general
Comment 1 Nemo 2013-08-12 05:06:43 UTC
This bug is probably too general to be useful (perhaps transform into a tracking bug?), but as we have another equally general report let me copy it here:

----

Small update: I went through the language list at

https://github.com/mkroetzsch/wda/blob/master/includes/epTurtleFileWriter.py#L472

and added a number of TODOs to the most obvious problematic cases. Typical problems are:

* Malformed language codes ('tokipona')
* Correctly formed language codes without any official meaning (e.g., 'cbk-zam')
* Correctly formed codes with the wrong meaning (e.g., 'sr-ec': Serbian from Ecuador?!)
* Language codes with redundant information (e.g., 'kk-cyrl' should be the same as 'kk' according to IANA, but we have both)
* Use of macrolanguages instead of languages (e.g., "zh" is not "Mandarin" but just "Chinese"; I guess we mean Mandarin; less sure about Kurdish ...)
* Language codes with incomplete information (e.g., "sr" should be "sr-Cyrl" or "sr-Latn", both of which already exist; same for "zh" and "zh-Hans"/"zh-Hant", but also for "zh-HK" [is this simplified or traditional?]). 

----

Small update: I went through the language list at

https://github.com/mkroetzsch/wda/blob/master/includes/epTurtleFileWriter.py#L472

and added a number of TODOs to the most obvious problematic cases. Typical problems are:

* Malformed language codes ('tokipona')
* Correctly formed language codes without any official meaning (e.g., 'cbk-zam')
* Correctly formed codes with the wrong meaning (e.g., 'sr-ec': Serbian from Ecuador?!)
* Language codes with redundant information (e.g., 'kk-cyrl' should be the same as 'kk' according to IANA, but we have both)
* Use of macrolanguages instead of languages (e.g., "zh" is not "Mandarin" but just "Chinese"; I guess we mean Mandarin; less sure about Kurdish ...)
* Language codes with incomplete information (e.g., "sr" should be "sr-Cyrl" or "sr-Latn", both of which already exist; same for "zh" and "zh-Hans"/"zh-Hant", but also for "zh-HK" [is this simplified or traditional?]).
Comment 2 Nemo 2013-08-12 05:07:57 UTC
Copy and paste fail... URL http://thread.gmane.org/gmane.org.wikimedia.wikidata/2524
Comment 3 Lydia Pintscher 2014-02-26 11:13:45 UTC
It seems fixed to me. I just made this edit with uselang=be-x-old: https://www.wikidata.org/w/index.php?title=Q1&diff=112330190&oldid=112313552

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links