Last modified: 2012-09-17 01:27:25 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T42251, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 40251 - PLURAL broken: always returns singular in some languages


Summary:	PLURAL broken: always returns singular in some languages

Status:	RESOLVED FIXED

Product:	MediaWiki
Classification:	Unclassified
Component:	Internationalization (Other open bugs)
Version:	1.20.x
Hardware:	All All

Importance:	High major with 1 vote (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:	i18n

Duplicates:	40250 40252 (view as bug list)
Depends on:
Blocks:	plural
	Show dependency tree / graph

Reported:	2012-09-14 12:50 UTC by JulesWinnfield-hu
Modified:	2012-09-17 01:27 UTC (History)
CC List:	16 users (show)

See Also:	40250
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description JulesWinnfield-hu 2012-09-14 12:50:23 UTC

The PLURAL magic word is not working on huwiki and in Hungarian messages on translatewiki since 12 September 2012. You can see at the page size in the history of a [[:hu:Special:Random random page]].

Comment 1 JulesWinnfield-hu 2012-09-14 12:52:13 UTC

[[:hu:Special:Random|random page]]

Comment 2 JulesWinnfield-hu 2012-09-14 12:52:42 UTC

[[:hu:Special:Random]]

Comment 3 Dereckson 2012-09-14 13:38:33 UTC

Adding bug 38781 as tracking bug.

Comment 4 Nemo 2012-09-14 13:51:13 UTC

So, this is about the history always showing "egy bájt" (a byte) for [[MediaWiki:Rc-change-size-new]]. We had similar reports about categories' summaries (number of pages) in ja and ko, so I'm changing the summary to see if it's actually related.

Comment 5 Minh Nguyễn 2012-09-14 19:12:25 UTC

Bug 40250 addressed this issue for Vietnamese, but Niklas thinks CLDR should be fixed. I disagree: MediaWiki’s use of the plural: magic word is incompatible with CLDR. According to the CLDR spec [1]:

“Note that these categories may be different from the forms used for pronouns or other parts of speech. In particular, they are solely concerned with changes that would need to be made if different numbers, expressed with decimal digits, are used with a sentence. If there is a dual form in the language, but it isn't used with decimal numbers, it should not be reflected in the categories.”

In the case of Vietnamese (and most likely other languages), there are plural forms, but not in the example they give (“Duration: 1 hour” → “Duration: 3.2 hours”). Instead, these plural forms occur where there is no decimal number (“the following user” → “the following 5 users”; “this page” → “these pages”). MediaWiki’s English localization has long used the plural: magic word for avoiding legalese like “page(s)”, and plenty of localizations at TranslateWiki have followed suit. In cases where a decimal number is displayed unconditionally, the Vietnamese localization simply omits the plural: magic word.

In short, I think overriding the plural rules is the right approach, and we should do the same for the other languages in the same boat. Just take a look at the translations of [[MediaWiki:Category-subcat-count]] for supposedly plural-less languages like Indonesian or Chinese.

Currently all of these languages’ localizations are severely broken: numbers are not showing up in many of the places they should, such as category counts and list totals in special pages.

[1] http://cldr.unicode.org/index/cldr-spec/plural-rules

Comment 6 Niklas Laxström 2012-09-15 02:03:05 UTC

I'm not sold to the idea that adding local plural rule overrides is the solution.

At translatewiki.net thread I proposed alternative solution which allows inline override in affected messages.

http://translatewiki.net/wiki/Thread:Support/PLURAL_keyword_for_languages_without_grammatical_plural_forms

Comment 7 Niklas Laxström 2012-09-15 04:49:18 UTC

*** Bug 40252 has been marked as a duplicate of this bug. ***

Comment 8 Niklas Laxström 2012-09-15 04:49:49 UTC

*** Bug 40250 has been marked as a duplicate of this bug. ***

Comment 9 Niklas Laxström 2012-09-15 05:00:34 UTC

https://gerrit.wikimedia.org/r/23852

Comment 10 Minh Nguyễn 2012-09-15 06:01:11 UTC

We might as well expand the override to include all of CLDR’s plural-less languages. [[id:Kategori:Wikipedia]] will still lack subcategory and page counts, after all.

Supporting explicit numeric arguments to plural: sounds like a good idea. That would add enough flexibility so that, for instance, the English localization could have a message with “{{PLURAL:$1|1=the user|2=both users|all $1 users}}”. A bot at Translatewiki could automatically add 1= to any invocations of plural: with more than one argument and factor out plural: where all the arguments are identical (due to inexperienced translators or translation memory).

Comment 11 Purodha Blissenbach 2012-09-15 10:18:17 UTC

There are caveats:

Adding a |n=... syntax will likely break all existing messages having literal ='s inside {{PLURAL: ...}}. Such cases are likely rare, though. Using <nowiki>=</nowiki> should solve the issue but is a performance eater.

PLURAL rules are generally not binding to simple figures, but rather to expressions for sets of numbers, such as (n mod 10 == 1) and the like.
If we cannot make sure that we never will need them, we should generally provide a way to use expressions as well. While this is not hard programmatically, the need to have "="s inside those expressions increases general PLURAL syntax complexity. Having to surround expressions with brackets to make a distinction seems fair.

Comment 12 Huji 2012-09-15 14:02:51 UTC

I was redirected here from bug 40252 and after reading the above comments quickly, my understanding is that this bug arises only for languages which are "plural-less".

I don't completely agree with this concept. As an example, Persian is plural-less in the sense that noun's are not pluralized if preceded by numbers (e.g. "1 book", "2 book") but nouns are pluralized if not preceded by numbers (e.g. "the book is there", "the bookS ARE there"), and also the verb is pluralized all the time (last example).

Up until now, we have been able to use PLURAL magic word to take care of the pluralization of the verbs, etc. Now, this functionality is completely gone.

Respectfully, I suggest the change to the functionality of PURAL magic word to be reverted IMMEDIATELY (as it has affected many projects). Only THEN, we can discuss what is the correct way to change the code again, and make sense of it.

Comment 13 Niklas Laxström 2012-09-15 17:26:46 UTC

(In reply to comment #11)
> Adding a |n=... syntax will likely break all existing messages having literal
> ='s inside {{PLURAL: ...}}. Such cases are likely rare, though. Using
> <nowiki>=</nowiki> should solve the issue but is a performance eater.

This a drawback but unavoidable. The CLDR expression syntax doesn't use any = signs so we don't have problems with ambiguity.

(In reply to comment #12)
> I don't completely agree with this concept. As an example, Persian is
> plural-less in the sense that noun's are not pluralized if preceded by numbers
> (e.g. "1 book", "2 book") but nouns are pluralized if not preceded by numbers
> (e.g. "the book is there", "the bookS ARE there"), and also the verb is
> pluralized all the time (last example).

It's arguable whether these languages should by default have two plural forms or not.

> Respectfully, I suggest the change to the functionality of PURAL magic word to
> be reverted IMMEDIATELY (as it has affected many projects). Only THEN, we can
> discuss what is the correct way to change the code again, and make sense of it.

Let's not throw the baby with the bathwater. We can (and did already for some languages) apply effective workaround while we sort out this problem.

Comment 14 Huji 2012-09-15 18:23:24 UTC

(In reply to comment #13)
> It's arguable whether these languages should by default have two plural forms
> or not.

It is arguable whether the CLDR representation of various modes of handling plurals is a fair and comprehensive or not. Based on [1], CLDR assumes there are only these modes: (a) to have two forms, like English; (b) to have one form only; (c) to have more than two forms.

The problem is, to have two forms, it doesn't have to be exactly like English (when determinants like "the" and counters like "two" both cause the subsequent noun to be pluralized). In other words, the (a) category above is not comprehensive enough to support languages like Persian or Mazani (while it supports English, Spanish or Turkish).

You might argue this is a limitation of CLDR and should be reported there, not in this bug. I will counter-argue that the MediaWiki implications of it is that we can't adopt a standard which is not comprehensive, hence the point about reverting the change.

[1] http://cldr.unicode.org/index/cldr-spec/plural-rules

Comment 15 Huji 2012-09-15 18:25:31 UTC

I am adding Roozbeh Pournader to this discussion; he is a native Persian speaker, and works for Unicode.org and may be able to shed some light here.

Comment 16 Santhosh Thottingal 2012-09-15 19:55:45 UTC

Alternate fix that tries to restore old MW behavior for languages without
defined plural rules - gerrit I345c3051

Comment 17 Tisza Gergő 2012-09-16 23:00:22 UTC

It boggles the mind that someone just made a breaking change in one of the most-used magic functions without bothering to ask for the opinion for translators or even notifying them afterwards. Are there no communication protocols in place at WMF at all? This is not the first time that users have to find out from obscure bug reports or commit summaries that they are supposed to use something differently.

Even languages which do not use plural with numbers might want use PLURAL for aesthetic reasons (such as writing "one" or "a" instead of 1), or might want to phrase the message differently when it is about multiple things. (For example, "you have [a message/X messages]; you can read **[it/them]** by..." - the starred part might be different even in languages which do not use plural with numbers; such is the case with Hungarian.)

Comment 18 Siebrand Mazeland 2012-09-17 01:27:25 UTC

(In reply to comment #17)
> It boggles the mind that someone just made a breaking change in one of the
> most-used magic functions without bothering to ask for the opinion for
> translators or even notifying them afterwards. Are there no communication
> protocols in place at WMF at all? This is not the first time that users have to
> find out from obscure bug reports or commit summaries that they are supposed to
> use something differently.

Hold your horses[1], Tisza. We are in the business of improving software, and in that process, an intended improvement (in this case using upstream standardised plural definitions over manually maintained ones for both PHP and JavaScript in MediaWiki only), had an unintended side effect. Tests were added[2] on the initial change, but as no test was present for this particular case, this particular breakage went unnoticed.

Luckily we have very well educated users who will report and issue -- this bug report is proof of that --  and a very responsive internationalisation development team, that has created a fix within 20 hours of the report being made. Gerrit change #23900 has now been merged, including a test, so that future breakage will be prevented.

A final note: Our software will keep breaking as we continue to improve it. It's not done on purpose, but it's part of the process.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links