Last modified: 2014-11-17 10:36:02 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T8948, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 6948 - Natural number sorting in category listings
Natural number sorting in category listings
Status: REOPENED
Product: MediaWiki
Classification: Unclassified
Categories (Other open bugs)
unspecified
All All
: Low enhancement with 5 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on: 164
Blocks:
  Show dependency treegraph
 
Reported: 2006-08-08 15:38 UTC by Michael Zajac
Modified: 2014-11-17 10:36 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Michael Zajac 2006-08-08 15:38:51 UTC
Just like in a book index, category listings should sort numbers by their value, not just as dumb strings of characters.  

Example: [[http://en.wikipedia.org/wiki/Category:Antonov]], partial listing: 

...
Antonov An-2
Antonov An-218
Antonov An-22
Antonov An-225
Antonov An-24
Antonov An-26
Antonov An-28
Antonov An-3
...

Of course, it should be 

...
Antonov An-2
Antonov An-3
Antonov An-22
Antonov An-24
Antonov An-26
Antonov An-28
Antonov An-218
Antonov An-225
...
Comment 1 Michael Zajac 2006-08-09 15:27:52 UTC
Does this really depend on bug 164?  That one concerns character-by-character sorting, or an implementation of the Unicode 
collation algorithm.  

This one requires some more smarts to reduce a long string of characters, possibly including figures, decimal points, or 
commas, to a single entity for sorting purposes.  I could be wrong, but this sounds like it would be a parallel programming 
effort.
Comment 2 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-08-09 22:15:17 UTC
I'd say it probably does, because bug 164 is basically "add a sort key to the
database schema instead of using binary sort".  But then again, from that
perspective they could be viewed as mutually dependent, or independent. 
Certainly a) they're closely related, b) bug 164 will get fixed before this
does, and c) it would be a good idea to get whoever fixes that to notice this as
well, even if strictly speaking this could be resolved without resolving that.

I assume (not being especially familiar with sorting algorithms) that numerical
sorting could be worked into sort keys somehow.  If it can, then it's closely
related to 164 and would probably be solved together; if it can't, then that
would imply that to implement it you have to resort the entire table every time
an entry is added or removed, which is unacceptable and will never be implemented.

But hey, remove it if you don't think it fits.  I don't mind.
Comment 3 Michael Zajac 2006-08-10 05:42:59 UTC
They're definitely related, but I'm not sure which bug is dependent on the other.  I suspect that whoever 
implements either, it would be good if they kept the other in mind.

I assumed that there would have to be a hook that sorts the results of a query.  Unicode sorting and 
numerical sorting would plug into the hook as two separate procedures, maybe, and there may be an 
advantage to one or the other going first.

But then, I have no clue of how the back end works.
Comment 4 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-08-10 05:56:38 UTC
(In reply to comment #3)
> I assumed that there would have to be a hook that sorts the results of a
query.  Unicode sorting and 
> numerical sorting would plug into the hook as two separate procedures, maybe,
and there may be an 
> advantage to one or the other going first.

No, the results are currently sorted via SQL "ORDER BY", not PHP.  Otherwise
you'd have to pull up the entire table of names, which is ridiculously wasteful
for larger categories/pages (e.g., 1,250,000+ article names to display 100). 
And sorting directly according to some complicated algorithm as you query the
rows is similarly infeasible, because the (comparatively expensive) collation
function would have to be executed on every possible pairing.

What this (and bug 164) would require is for an extra column to be added to
various tables, a sort key.  PHP would calculate the sort key only when a title
is created, would tell SQL to stick it in the column, and then the query would
just have "ORDER BY sortkey" or what have you, which would be a binary sort and
therefore very fast as sorts go.  (There's also some discussion about using
native collation algorithms packaged with newer versions of MySQL, but they
appear to have some serious limitations.)

So the major change these require is changing the database schema and working
out sorting functions.  Once that's implemented, tweaking the sorting function
would just mean a minor change to the PHP (well, and recalculating the sort keys
for every page in the wiki).  What you *do* with the sort key is pretty much
icing, so these two bugs are pretty much the same.

Of course, all the above should be taken with a slight grain of salt, because I
haven't actually looked at the code and am not an expert in the matter.  But
this is my impression from various sources.
Comment 5 gpvos 2007-01-30 20:52:20 UTC
"Invisible" sort keys have already been implemented for a long time, see http://en.wikipedia.org/wiki/
Wikipedia:Categories#Category_sorting . For an example, see how I've fixed the Antonov category with 
these edits: http://en.wikipedia.org/w/
index.php?title=Special:Contributions&go=prev&offset=20070130204422&limit=50&target=Gpvos .

It may still be nice to have a way to have MediaWiki do this automatically in the future, but I would 
consider it extremely low priority.
Comment 6 flaxter 2008-07-09 15:04:06 UTC
Can the "Invisible" sort key be used to solve the problem on this page: http://en.wikipedia.org/wiki/List_of_countries_by_GDP_(PPP)_per_capita ?

Currently if you sort by Rank you see 1, 10, 100, 101 which isn't correct. Otherwise, if this is an unrelated problem should I file a separate bug report?
Comment 7 jon513 2008-09-09 10:49:14 UTC
I have fixed this on a mediawiki I run by inserting the line

insert: usort($this->articles, 'strnatcasecmp');

at the very first line of finaliseCategoryState(), in includes/CategoryPage.php
Comment 8 Ben 2011-03-16 17:29:35 UTC
(In reply to comment #6)
> Can the "Invisible" sort key be used to solve the problem on this page:
> http://en.wikipedia.org/wiki/List_of_countries_by_GDP_(PPP)_per_capita ?
> 
> Currently if you sort by Rank you see 1, 10, 100, 101 which isn't correct.
> Otherwise, if this is an unrelated problem should I file a separate bug report?

just to add a +1 to flaxter@gmail.com's comment, I noticed the same bug here:
http://en.wikipedia.org/wiki/Energy_density#Energy_densities_ignoring_external_components

sorting by Energy density by volume (MJ/L) gets you a couple of interesting sequences, including
43.5, 5.6, 6.02, 72.4
75.1, 8.8, 83.8, 9
38.2, 4.633016x10^104, 40.8

I can see mishandling the exponential value, but am confused as to why it didn't end up between 4 and 5, instead of 38 and 40. The ordering doesn't even make sense. At first I thought it was just ignoring the decimal place, but that doesn't even work for any but the first string I copied.

I'm no coder, so sorry, I can offer no suggestions as to fixes, but it does appear to be a common thing with the sorting.
Comment 9 Krinkle 2011-03-16 17:44:22 UTC
(In reply to comment #8)
> (In reply to comment #6)
> > Can the "Invisible" sort key be used to solve the problem on this page:
> > http://en.wikipedia.org/wiki/List_of_countries_by_GDP_(PPP)_per_capita ?
> > 
> > Currently if you sort by Rank you see 1, 10, 100, 101 which isn't correct.
> > Otherwise, if this is an unrelated problem should I file a separate bug report?
> 
> just to add a +1 to flaxter@gmail.com's comment, I noticed the same bug here:
> http://en.wikipedia.org/wiki/Energy_density#Energy_densities_ignoring_external_components
> 
> sorting by Energy density by volume (MJ/L) gets you a couple of interesting
> sequences, including
> 43.5, 5.6, 6.02, 72.4
> 75.1, 8.8, 83.8, 9
> 38.2, 4.633016x10^104, 40.8
> 
> I can see mishandling the exponential value, but am confused as to why it
> didn't end up between 4 and 5, instead of 38 and 40. The ordering doesn't even
> make sense. At first I thought it was just ignoring the decimal place, but that
> doesn't even work for any but the first string I copied.
> 
> I'm no coder, so sorry, I can offer no suggestions as to fixes, but it does
> appear to be a common thing with the sorting.


This bug is about category sorting not about the table "sortable" script.
Comment 10 jon513 2011-03-16 18:29:57 UTC
It seems like this has been fixed (http://en.wikipedia.org/wiki/Category:Antonov_aircraft sorts correctly).  I would find it hard to believe that it hasn't been - I posted a fix for this two and a half years ago!
Comment 11 Bawolff (Brian Wolff) 2011-03-16 19:50:25 UTC
(In reply to comment #10)
> It seems like this has been fixed
> (http://en.wikipedia.org/wiki/Category:Antonov_aircraft sorts correctly).

That category is using custom sortkeys with 3 digit 0-padding (see http://en.wikipedia.org/w/api.php?action=query&list=categorymembers&cmprop=sortkey|title&cmtitle=Category:Antonov_aircraft&cmlimit=max ). Thus the behaviour there does not indicate the bug is fixed.

The bug is still present. Re-opening

>  I
> would find it hard to believe that it hasn't been - I posted a fix for this two
> and a half years ago!

Where?
Comment 12 jon513 2011-03-16 20:39:03 UTC
> >  I
> > would find it hard to believe that it hasn't been - I posted a fix for this two
> > and a half years ago!
> 
> Where?

comment 7 above.
Comment 13 Bawolff (Brian Wolff) 2011-03-16 21:06:51 UTC
That doesn't work fully. It does fix it for a single view of a category page, but the way its broken up between next/prev boundries doesn't change.

As a result you can have situations where you could have the last entry in one page of a category not be followed by the first entry in the next page, which would just be weird. Thus I don't think we should do that.
Comment 14 Bawolff (Brian Wolff) 2013-02-06 00:41:07 UTC
hmm, icu library seems to support natural number sorting. Have not tested though. May be possible to implement this as a custom collation.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links