Last modified: 2013-04-22 16:14:37 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T44607, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 42607 - Tablesorter sorts all numbers as dates in Czech
Tablesorter sorts all numbers as dates in Czech
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
JavaScript (Other open bugs)
1.20.x
All All
: High normal with 7 votes (vote)
: 1.22.0 release
Assigned To: Bartosz Dziewoński
http://cs.wikipedia.org/wiki/?oldid=9...
: code-update-regression, i18n
Depends on:
Blocks: 31601
  Show dependency treegraph
 
Reported: 2012-12-01 17:40 UTC by Mormegil
Modified: 2013-04-22 16:14 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---
federicoleva: Backport_to_Stable-


Attachments

Description Mormegil 2012-12-01 17:40:12 UTC
The table sorting feature is currently quite broken in Czech, apparently because the tablesorter considers (almost) any number to be a calendar date. (See the linked URL and click e.g. the “Sčítání 2001” column.)

The sorter constructs the date-matching regex as

^\\s*\\d{1,2}[\\,\\.\\-\\/\'\\s]*(' + regex_for_months + ')' + '[\\,\\.\\-\\/\'\\s]*\\d{2,4}\\s*$

The problem in Czech is triggered by the fact that we do not use alphabetic abbreviations for short month names, but just month numbers, i.e. our final regex looks like

^\s*(\d{1,2})[\,\.\-\/'\s]*(leden|1|únor|2|březen|3|duben|4|květen|5|červen|6|červenec|7|srpen|8|září|9|říjen|10|listopad|11|prosinec|12)[\,\.\-\/'\s]*(\d{2,4})\s*$

And because the separators are qualified with *, i.e. optional, the regex considers e.g. “123456” to be, in fact, something like “12/3/456”, i.e. “March 12, 456”. Which is obviously silly.

Do we really want to write dates as e.g. “1Dec2012”, so that we would need the * there? Wouldn’t + be better? (Or, at the very least, check whether wgMonthNamesShort is not numeric, and do not add it into the regex if it is.)

On the other hand, the date recognition does not really work for Czech, anyway. In dates, we use the genitive month names (e.g. “1. prosince 2012”, not “1. prosinec 2012”), see Language::mMonthGenMsgs.
Comment 1 Miraceti 2012-12-02 16:52:35 UTC
Table sorting is really seriously broken. Dates are not sorted in the Czech language at all.
Following code should produce a sortable table:

{| class="wikitable sortable"
! "1. 1. 2000" !! "01.01.2000" !! "1.1.2000" !! "1. ledna 2000" !! "1. leden 2000"
|-
| 1. 1. 2000 || 01.01.2000 || 1.1.2000 || 1. ledna 2000 || 1. leden 2000
|-
| 1. 10. 1999 || 01.10.1999 || 1.10.1999 || 1. října 1999 || 1. říjen 2000
|-
| 1. 10. 2010 || 01.10.2010 || 1.10.2010 || 1. října 2010 || 1. říjen 2010
|}

The first two columns are short formats according to norms, the third one is a common mistake, the fourth one is a correct long format and the last one is with a nominative month name.

Sorting does not work at all for short formats. The only working column in my example is the last one - but as Mormegil has explained, this is not a way how dates are written. The long format in the fourth column is not sorted correctly (reason explained by Mormegil too).

I propose to set the needed regexp in Mediawiki configuration and not to complete it on a client side - it would be a faster solution too.
Comment 2 Danny B. 2012-12-04 23:12:15 UTC
Just create your custom sorting parser in Common.js.
Comment 3 Andre Klapper 2012-12-04 23:19:11 UTC
Danny B: How is that a solution if the default does not work?
Comment 4 Bartosz Dziewoński 2012-12-04 23:37:09 UTC
Reopening; the code is clearly broken, not handling languages other than English correctly; worse, not just not detecting the dates, but detecting them wrong. There were similar (although even worse) issues with date handling in tablesorter before (see bug 42097).

If you ask me, everything in that code that attempts to detect and sort dates in formats other than YYYY-MM-DD (which should sort just fine anyway) should be taken behind the barn and shot.
Comment 5 Danny B. 2012-12-05 00:04:45 UTC
No, the code is not broken. It sorts properly in default. The solution is that any site, which wants to sort different way than predefined sorts for predefined types and forms of data, is supposed to have its own custom sorting parser. And properly marked column headers to sort via such parser. That is the correct way how to handle it.
Comment 6 Mormegil 2012-12-05 09:21:02 UTC
Eh? Don’t be silly. Create a new vanilla MediaWiki installation with $wgLanguageCode = "en". Create a new page

{| class="wikitable sortable"
! Data
|-
| 123456
|-
| 98765
|-
| 333555
|-
| 2468
|}

Click on the header, it sorts correctly. Change $wgLanguageCode to "cs", try the same, it sorts incorrectly. MediaWiki does not “sort properly in default”. There are no custom data formats, no custom sorting, nothing nonstandard, just plain old integers here. The tablesorter is just plain broken.
Comment 7 Matthias 2013-02-09 16:56:56 UTC
I'm sure that the problem is that in Czech Wikipedia "mw.config" variable 'wgMonthNamesShort' contain digits. This variable should contain only names!
Comment 8 Mormegil 2013-02-11 10:33:48 UTC
(In reply to comment #7)
> I'm sure that the problem is that in Czech Wikipedia "mw.config" variable
> 'wgMonthNamesShort' contain digits. This variable should contain only names!

Says who and why? As I said in the original report, “we do not use alphabetic
abbreviations for short month names, but just month numbers”. As we say in Czech, “se stim smiř” [learn to live with it]. (You might also want to check Korean (ko) and also possibly bxr, Chinese, Japanese, and other languages.)

You cannot change the world to fit a broken regex. The variable did never have such documented restrictions (and such restriction would be silly, anyway), and the numeric values have been there since r4534 (!).
Comment 9 Bartosz Dziewoński 2013-02-19 19:28:54 UTC
This will fix itself if bug 45161 is solved.
Comment 10 Bartosz Dziewoński 2013-03-23 21:30:13 UTC
Another example: https://cs.wikipedia.org/wiki/Wikipedista:Gampe/Pískoviště
Comment 11 Bartosz Dziewoński 2013-03-23 21:33:03 UTC
I submitted a patch to hopefully fix the issue as I3a37acf1. As suggested in comment 0, it makes the separators required.

This won't fix the date parsing properly for Czech (no genitive month names support), but it should unbreak number parsing at least.
Comment 12 Bartosz Dziewoński 2013-03-23 22:17:20 UTC
(In reply to comment #11)
> This won't fix the date parsing properly for Czech (no genitive month names
> support)

I opened bug 46496 for that.
Comment 13 Ori Livneh 2013-04-04 22:01:47 UTC
(Disclaimer: I am a typical American chauvinist and don't know nearly as much as I should about date conventions in other languages. I'm sorry if I'm totally wrong.)

It looks to me like the Czech messages may be incorrect. The message descriptions specify that the short month name messages are abbreviations of the full month names. So why are the Czech messages numeric? Shouldn't they be led., ún., břez., dub., and so on?
Comment 14 Danny B. 2013-04-04 23:34:31 UTC
(In reply to comment #13)
> It looks to me like the Czech messages may be incorrect. The message
> descriptions specify that the short month name messages are abbreviations of
> the full month names. So why are the Czech messages numeric? Shouldn't they
> be led., ún., břez., dub., and so on?

See comment #0 and comment #8
Comment 15 Ori Livneh 2013-04-15 17:09:42 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > It looks to me like the Czech messages may be incorrect. The message
> > descriptions specify that the short month name messages are abbreviations of
> > the full month names. So why are the Czech messages numeric? Shouldn't they
> > be led., ún., břez., dub., and so on?
> 
> See comment #0 and comment #8

Right. This seems like a delicate mess. Bartosz's change I3a37acf1 seems like a simple and straightforward improvement over the status quo, though, so I merged it.
Comment 16 Gerrit Notification Bot 2013-04-15 17:09:45 UTC
https://gerrit.wikimedia.org/r/55494 (Gerrit Change I3a37acf1985eddf922e69e2c2a1cf541fc00e97e) | change APPROVED and MERGED [by jenkins-bot]
Comment 17 Bartosz Dziewoński 2013-04-15 17:16:15 UTC
Marking as RESOLVED, then. Needs backporting, I guess?
Comment 18 Nemo 2013-04-15 17:20:40 UTC
I've read all the comments here and this doesn't seem a regression, nor a catastrophic bug, so no – I wouldn't think it needs to be backported.
Comment 19 Bartosz Dziewoński 2013-04-15 17:25:00 UTC
[Adjusting milestone.]
Comment 20 Mormegil 2013-04-15 21:28:52 UTC
(In reply to comment #18)
> I've read all the comments here and this doesn't seem a regression, nor a
> catastrophic bug, so no – I wouldn't think it needs to be backported.

Well, this is definitely a regression, even though not in 1.21: I just tested the example from comment #6 above in MediaWiki 1.19.5 (previous/LTS), and it works fine, while in the current MediaWiki 1.20.4, it is broken. Even though this is probably not a “catastrophic bug”, it means “sortable” tables just do not sort numbers in Czech at all, which they used to do correctly in MW 1.19. Just saying.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links