Last modified: 2013-07-25 09:00:06 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T34483, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 32483 - en.wp uses lang="simple" for simple: interlang links.


Summary:	en.wp uses lang="simple" for simple: interlang links.

Status:	PATCH_TO_REVIEW

Product:	MediaWiki
Classification:	Unclassified
Component:	Internationalization (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:	i18n

Depends on:
Blocks:	25591
	Show dependency tree / graph

Reported:	2011-11-18 23:57 UTC by Daniel Friesen
Modified:	2013-07-25 09:00 UTC (History)
CC List:	6 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Use $wgDummyLanguageCodes for getting the right language code (1.08 KB, patch) 2011-12-01 21:04 UTC, Robin Pepermans (SPQRobin)	Details
Language.php patch, including first go at a mapping table... (1.78 KB, patch) 2011-12-11 14:14 UTC, Derk-Jan Hartman	Details
Add an attachment (proposed patch, testcase, etc.)

Description Daniel Friesen 2011-11-18 23:57:43 UTC

'simple' isn't a valid language code, though we're outputting it for interlanguage links.

We 'could' add in a simple hack here that will make 'simple' output lang="en" instead.

Though I do have a bit of a more interesting idea. Instead of what, how about we swap simple for en-x-Simple and add in a code that lets us create aliases for language codes so that simple: will still be equivalent to en-x-Simple.

Going by bcp47 (https://www.rfc-editor.org/rfc/bcp/bcp47.txt) the code en-x-Simple is valid. It's an 'en' lang code with a private subtag of 'Simple'. bcp47 reserves x-* for private use purposes, things that wouldn't be registered, essentially that's what we're talking about here.

Comment 1 Derk-Jan Hartman 2011-11-19 12:06:01 UTC

This issue is much wider actually. It applies for all languages listed here: http://en.wikipedia.org/wiki/List_of_Wikipedias#Wikipedia_edition_codes

The solution. We need to add a "language" tag normalization in the core trough which we can put the ll_lang of the Langlinks_table table of the database, before we actually generate a 'real' lang tag.

Such a normalization table (language_mapping ?) would have

ll_lang: the wiki defined interlanguage code
wiki_variant: a wiki language variant code
iso 639-1
iso 639-2
bcp47 code: (includes private codes, variant names, sign, transliteration etc)

Could probably be built upon 'Extension:CLDR'

Comment 2 Derk-Jan Hartman 2011-11-19 14:45:35 UTC

Slightly related: r103640

Comment 3 Robin Pepermans (SPQRobin) 2011-12-01 21:04:35 UTC

Created attachment 9591 [details]
Use $wgDummyLanguageCodes for getting the right language code

I think it is sufficient to use $wgDummyLanguageCodes (per r103640) for this, since it will contain all code mappings relevant for MediaWiki/Wikimedia. A database like you propose seems overkill to me.

Comment 4 Robin Pepermans (SPQRobin) 2011-12-01 21:34:17 UTC

This is weird, those attributes were only added just today: r104778. So, what did this bug report refer to? The class="interwiki-simple" on the <li> element?

Comment 5 Brion Vibber 2011-12-02 01:44:08 UTC

Since 'simple' is in Language's list of language names, I think it'd be cleaner to have the logic for this living in Language.

Maybe Language::normalizeCode( $code ) ?

That could also normalize a number of fuzzy old things that we still have in our list for compatibility:
* simple -> en or en-x-simple
* bat-smg -> sgs
* roa-rup -> rup
* fiu-vro -> vro

etc


Note that there are manual language links on [[en:Main_Page]] at the bottom (not in the sidebar) which have 'lang' attributes on spans surrounding the links. The one for 'Simple English' does use 'simple' as the value here, but this can be changed by editing the page or template.

Comment 6 Niklas Laxström 2011-12-02 10:03:36 UTC

normalize would be ambiguous in this function. Should be something that refers to getting a standards compatible language code.

Comment 7 Robin Pepermans (SPQRobin) 2011-12-02 12:46:18 UTC

I was thinking about a Language function as well. Maybe getCorrectCode() or getActualCode()?

We might also use it for other lang="" attributes, like on the html tag.
I see that wgLanguageCode has been changed for several wikis (like 'alswiki' => 'gsw') but not all of them (e.g. fiu-vro not).

Comment 8 Daniel Friesen 2011-12-02 14:57:27 UTC

getBcp47Code()?

Comment 9 Derk-Jan Hartman 2011-12-11 14:14:11 UTC

Created attachment 9656 [details]
Language.php patch, including first go at a mapping table...

Comment 10 Derk-Jan Hartman 2011-12-12 08:44:29 UTC

some comments:

1: We should probably have  getBCP47LanguageTag( $code, [$variant] )
2: My patch maps getCode() to use getBCP47LanguageTag(), but that was just to get some quick testing done of course.
3: The table... I'm not entirely sure we want to use wgDummyLanguageCodes. Or alternatively, wether that table should contain qqq qqz in the way that it does now. Perhaps adapt wgDummyLanguageCodes into wgLanguageTagConversion()=wgDummyLanguageCodes ++ qqq+ qqz; or something simliar

Comment 11 Derk-Jan Hartman 2011-12-12 08:51:00 UTC

other way around of course. wgDummyLanguageCodes=wgLanguageTagConversionTable ++ qqq+ qqz;

Comment 12 Derk-Jan Hartman 2011-12-13 22:21:03 UTC

See also r105812 and friends.

Comment 13 Derk-Jan Hartman 2012-09-05 15:27:22 UTC

https://gerrit.wikimedia.org/r/22727

Comment 14 Derk-Jan Hartman 2013-03-20 18:37:22 UTC

Changeset dropped.

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links