Last modified: 2011-03-09 11:12:37 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T6912, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 4912 - "Next 200" link in category page broken
"Next 200" link in category page broken
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Categories (Other open bugs)
1.6.x
All All
: Normal normal with 2 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 5241 16021 23803 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-02-08 00:51 UTC by David Benbennick
Modified: 2011-03-09 11:12 UTC (History)
13 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description David Benbennick 2006-02-08 00:51:01 UTC
At [[Category:Images with unknown copyright status]], the "next 200" link is

http://en.wikipedia.org/w/index.php?title=Category:Images_with_unknown_copyright_status&from=0212006

If you click the link, all the images that were visible before are still visible.
Furthermore, the "next 200" and "previous 200" links both use "from=0212006".
Comment 1 lɛʁi לערי ריינהארט 2006-02-11 21:42:06 UTC
Hallo!

from=0212006 is the "category key".
http://en.wikipedia.org/w/index.php?title=template:no%20license&action=edit
http://en.wikipedia.org/w/index.php?title=template:no%20license&action=history
shows how this happened.
http://en.wikipedia.org/w/index.php?title=Image%3ATheUsed.JPG&diff=37823581&oldid=26403756
updated the sort keys and the links.

In the category the first and the last image have the same sortkey. This is a
dead lock.
Bug 2177: Expenading sort order BEHIND the sort key
would offer a workaround for such situations.

Users should specify 'SORT KEY'. MediaWiki should use
{{SORT KEY}}{{NAMESPACE}}:{{PAGENAME}} internaly.

best regards reinhardt [[user:gangleri]]
Comment 2 lɛʁi לערי ריינהארט 2006-02-11 21:43:17 UTC
Oops! MediaWiki should use
{{SORT KEY}}{{PAGENAME}}:{{NAMESPACE}} internaly.
Comment 3 Brion Vibber 2006-03-13 22:02:51 UTC
*** Bug 5241 has been marked as a duplicate of this bug. ***
Comment 4 William Allen Simpson 2006-03-14 02:57:30 UTC
(copied from Bug 5241)
I've also reported at
http://en.wikipedia.org/w/index.php?title=Wikipedia%3AVillage_pump_%28technical%29#Category_Sort_Blank_bug

When there are a large number of items in category sort blank
(category with pipe blank "[[...| ]]"), the next 200 do not appear.

Likewise, jumping to 0-9 and then trying previous 200, they do not
page back to the beginning. So the problem is both directions.

In [[:Category:Redirects with possibilities]], you can see the
problem. It's existed this way for several months, so please
don't fix this until the developers can look at it!

Yes, I know the problem is a bad category in [[Template:R to decade]].
But again, it's been that way for months, so leave it alone for
testing purposes.

I've already fixed (during the past two weekends) the problem in 4
templates that made this same massive 6,000+ entry bug at
[[:Category:Unprintworthy redirects]]. That's why I was looking for
more examples, to determine whether it was a one time thing.

So, how long before this known bug will be fixed?
Comment 5 lɛʁi לערי ריינהארט 2006-04-28 01:25:47 UTC
(In reply to comment #2)
> Oops! MediaWiki should use
> {{SORT KEY}}{{PAGENAME}}:{{NAMESPACE}} internaly.

Some more details on this:

{{SORT KEY}}{{special character}}{{PAGENAME}}{{special
character}}{{GENERICNAMESPACEORDER}}

a) {{special character}} should be a character with a value less then  
tab '	' could be used

b) {{GENERICNAMESPACE}} or better {{GENERICNAMESPACEORDER}} would avoid
interference with namespace localization.

Comment 6 Philippe Verdy 2008-06-23 06:09:28 UTC
Actually, the effective sort key should use at least 3 levels to match with the Unicode UCA algorithm:
# The provided key converted internally to all capitals, and with all non-letters converted to spaces, then all spaces compacted
# The provided key with its original case with just non-letters converted to spaces then all spaces compacted
# The provided key as is.

Note: the first character of the part 1 must not be changed to a space but must be kept if it's not a letter. However it should be capitalized if it's a letter. the reason is that it will be used to generate distintive subgroups in the displayed list. If it's a space, it must be preserved even if the rest of spaces after it can be compressed.

To compress the resulting key, the part 2 can be trimmed for the characters at end that are common at end of part 2. The same can be done for part 3 (but it must still be compared with the original part 2).

Then to form the effective sortkey, the three parts should be concatenated with a separator lower than a space. If part 3 is empty, you don't need to concatenate it and its leading separator; if both parts 2 and 3 are empty, you don't need to concatenate both of them and their leading separator. If part 2 is empty but not part 3, then the empty part 2 must sill be generated (meaning that part 3 will be separated from part 1 by  two separators).

Finally, as the provided sort key is not necessarily unique (it may be different from the full page used by default, including the namespace name and colon), an additional part should be added with a separator to the previous key; it won't be needed if the (non-compressed) part 3 (the original provided sort key) is identical to the full page name.

Note that Wikipedia still uses by the full page name for its default sort key; it would probably be better if it used by default only the page name, then a separator, then the namespace, because it would avoid having to specify <nowiki>{{PAGENAME}}</nowiki> as the explicit sort key in many pages. Note that the namespace itself could be compressed by replacing it with just the namespace number right-padded on 3 characters by filling zeroes.

This should work correctly in wiktionary where it could be tested on long lists of words in various languages. (this algorithm is already used on French Wiktionary with a template generating the sort key, however it lacks the string reversal for part 3, needed for corect French sort order... This template-based implementation however is quite tricky; and because a tab is not handled correctly within categories, the separator chosen there is " !", a space followed by an exclamation.)

Ideally, MediaWiki should implement the full UCA algorithm, however the computed sortkeys will not be displayable even though they will probably be even more compact. With the full UCA, you'll be faced to problems like tailorization per language, so that multi-character graphemes recognized like one letter will sort correctly, and so that the tailored capitalization rules for that language (using special case mappings like Turkish or Azeri for the conversion of dotted-i and undotted-I) will work reliably. also this may be needed when non-letters are used as part of the language alphabet (such as apostrophes in the middle of the grapheme cluster making a single letter).
Comment 7 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-10-20 14:50:16 UTC
*** Bug 16021 has been marked as a duplicate of this bug. ***
Comment 8 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-10-20 14:50:35 UTC
Note that the cl_sortkey field is varchar(70).  We simply don't have the space to concatenate much of anything to the end.  We could reserve the last several bytes for some encoding of the page_id, but that's really unnecessary: we already have an index on (cl_to, cl_sortkey, cl_from), so a sort on (cl_sortkey, cl_from) would provide unique results without having to modify existing sortkeys.  So the URL would like like

http://en.wikipedia.org/w/index.php?title=Category:Images_with_unknown_copyright_status&from=0212006&fromid=12345

Of course, the results would be sorted in a meaningless order when sort keys are the same, but the order would be consistent, and so this bug would not occur.  To make the sort order slightly more meaningful, we could append the full page name to custom sort keys, keeping in mind that it would quite likely get truncated for long page names.

Using the full UCA algorithm would be great, anyone up for implementing that?  :)  That would solve bug 164, but isn't needed here at all.  Given current space constraints of 70 characters (when page titles are up to 255 + namespace), we can't use anything as complicated as three sorting levels without a schema change.  We can't even use one sorting level and expect it to be unique.

Assigning to self, although I can't promise I'll get to it anytime soon.  This will need to modify IndexPager so it can sort on two columns at once.
Comment 9 Philippe Verdy 2008-10-20 17:16:25 UTC
See my comment in bug 164 for how the pseudo-UCA sort works in French Wiktionnary (I designed it, it still has some known caveats and limitations, but it works remarkably well, and the way it is implemented allows automating the feeding of sort keys, with an algorithm that is quite easy to understand.)
Look at the documentation page of "[[Modèle:clé de tri]]" (French for "Template:sort key"), it is written in French, but if you need help and can't read French, ask me or to some admins in French Wiktionnary).
Comment 10 Roan Kattouw 2008-10-21 15:49:55 UTC
(In reply to comment #8)
> Of course, the results would be sorted in a meaningless order when sort keys
> are the same

IMO, the sort order for duplicate sort keys doesn't *have* to be meaningful: users should know that the sorting order for duplicate sort keys is undefined, or at least that if they want control over the sort order, they should just use unique sort keys.
Comment 11 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-10-23 00:09:03 UTC
Agreed.  Also, it would mess up the URLs somewhat if we started appending the article title all over the place.  Maybe leave that for later, for now the id would be fine for disambiguation (when someone wants to do it).
Comment 12 Bryan Baron 2009-09-23 21:09:54 UTC
I can't reproduce this. Was this fixed?
Comment 13 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-09-23 23:13:13 UTC
No, AFAIK, although the exact page linked to doesn't show the problem anymore.  To reproduce, add [[Category:Bug 4912 test case| ]] or something to more than 200 pages on some wiki or other, and then observe that the resulting category page doesn't paginate correctly.

Unassigning from self since I'm not likely to do anything about this in the foreseeable future.
Comment 14 Derk-Jan Hartman 2010-06-21 23:30:25 UTC
*** Bug 23803 has been marked as a duplicate of this bug. ***
Comment 15 Svick 2010-09-26 21:24:47 UTC
Note that this bug also occurs when using the API: I tried enumerating subcategories of [[Category:Gastropod genera without authority reference]] and because more than 200 pages had the sortkey of space, my program entered infinite loop.
Comment 16 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-09-26 21:34:20 UTC
This is fixed for common cases in trunk with the categorylinks rewrite.  I don't know if the API has caught up yet.  It's still not totally bulletproof, if the custom sort key is really long (200+ bytes) -- we need to take the id into account for that.  So I'll leave the bug open.
Comment 17 Bawolff (Brian Wolff) 2010-09-26 21:36:50 UTC
(In reply to comment #15)
> Note that this bug also occurs when using the API: I tried enumerating
> subcategories of [[Category:Gastropod genera without authority reference]] and
> because more than 200 pages had the sortkey of space, my program entered
> infinite loop.

Note, thats possible to work around (in the api that is for whatever version wikimedia wikis are using) by not using the cmnamespace parameter and filtering by namespace on the client side.
Comment 18 Gustronico 2011-03-08 22:35:37 UTC
Right now, "Next 200" links are broken in *all* categories containing 200+ items. Tested in en:wiki and es:wiki.
Comment 19 Bawolff (Brian Wolff) 2011-03-08 23:54:57 UTC
(In reply to comment #18)
> Right now, "Next 200" links are broken in *all* categories containing 200+
> items. Tested in en:wiki and es:wiki.

That's a separate issue (So in general its probably a separate bug). However its probably related to the fact we're in the middle of changing the ways categories work (change to how articles are sorted, for multilingual goodness and what not), so its probably a temporary issue well the categories get updated, and should go away on its own.
Comment 20 Jens K Andersen 2011-03-09 00:16:34 UTC
The "next" link in categories currently has a url saying &pagefrom=.
It partially works if the url is manually changed to &from= as it has said before.
I only say partially works because it apparently only considers the first character after &from=.
Comment 21 Roan Kattouw 2011-03-09 11:11:03 UTC
(In reply to comment #20)
> The "next" link in categories currently has a url saying &pagefrom=.
> It partially works if the url is manually changed to &from= as it has said
> before.
> I only say partially works because it apparently only considers the first
> character after &from=.
This was fixed at 00:43 UTC.
Comment 22 Roan Kattouw 2011-03-09 11:12:37 UTC
Closed as FIXED because the bug as filed is also fixed on WMF, per comment 19.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links