Last modified: 2013-03-08 19:52:48 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T37632, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 35632 - Set $wgCategoryCollation to 'uca-default' and rebuild category sort keys on Portuguese Wikipedia
Set $wgCategoryCollation to 'uca-default' and rebuild category sort keys on P...
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Tim Starling
https://pt.wikipedia.org/wiki/Wikip%C...
: shell
Depends on:
Blocks: 30996
  Show dependency treegraph
 
Reported: 2012-03-31 19:44 UTC by Helder
Modified: 2013-03-08 19:52 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Changes related to [[Categoria:Sociedade de Transportes Colectivos do Porto]] (60.68 KB, text/html)
2012-09-19 02:47 UTC, Helder
Details

Description Helder 2012-03-31 19:44:11 UTC
On bug 30996, Tim recommended changing the collation method to "uca-default" on a language-by-language basis, after checking each language for correct collation and first-letter behaviour on a test wiki. 

According to the tests made by Bawolff, the "uca-default" collation method seems to work fine on Portuguese:
http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/59758/focus=59767
and in the same topic, Tim said the results are good enough.

On
[[pt:Wikipédia:Esplanada/propostas/Melhorar a ordenação das páginas com títulos acentuados (20mar2012)]]
editors from Portuguese Wikipedia agreed to enabling uca-default sorting on ptwiki.

So, I believe it is feasible to set $wgCategoryCollation to 'uca-default' on ptwiki (and make any necessary updates, or run the necessary maintenance scripts).
Comment 1 Dereckson 2012-06-23 21:16:29 UTC
Tim recommendation is more exactly:

"So I recommend doing this change on a language-by-language basis, after checking
each language for correct collation and first-letter behaviour on a test wiki. 

Also, it would be nice to know in advance what percentage of sort keys will be
larger than the 230 bytes allowed by the database field, and if that percentage
is significant, whether there are categories on the target wikis where the
order will be changed by truncation after 230 bytes.".

For the first part, If I prepare you a testwiki with this setting enabled, would you be willing to populate pages and categories for this test?
Comment 2 Helder 2012-06-23 23:25:35 UTC
Sure. I could request some help on ptwiki's village pump.

Just to be sure: wouldn't be easier to use special:import to get a list of (categorized) pages directly from ptwiki?
Comment 3 Dereckson 2012-06-24 06:39:20 UTC
Indeed, that could also be a way.

I will prepare that this Monday or Tuesday.
Comment 4 Helder 2012-07-05 13:05:05 UTC
So?
Any news?
Comment 5 Tim Starling 2012-07-11 07:45:05 UTC
Sort key size histogram for ptwiki with uca-default:

0-25:      1349546    |**************************************
26-51:     2124309    |************************************************************
52-76:     878662     |************************
77-102:    163018     |****
103-128:   42182      |*
129-154:   13498      |
155-180:   3402       |
181-205:   1679       |
206-231:   482        |
232-257:   214        |
258-283:   59         |
284-309:   42         |
310-334:   8          |
335-360:   2          |
361-386:   2          |
387-412:   0          |
413-438:   2          |
439-463:   0          |
464-489:   0          |
490-516:   3          |

99.993% of category entries have sort keys smaller than the limit of 230 bytes; 332 entries would have their sort keys truncated. It's unlikely that the order of any categories would be affected by truncation. The total index size would go up from about 116MB to 172MB. 

I think we just need to schedule a deployment window now.
Comment 6 Rob Lanphier 2012-08-16 00:38:17 UTC
Scheduled for Tuesday, August 21 23:30-01:30 (next day) (4:30pm-6:30pm PDT) - Tim will be doing this deploy
Comment 7 Helder 2012-09-19 02:47:07 UTC
Created attachment 11125 [details]
Changes related to [[Categoria:Sociedade de Transportes Colectivos do Porto]]

For the record: a user noticed two articles were out of the expected order for no reason in one of our categories:
https://pt.wikipedia.org/w/index.php?title=Wikip%C3%A9dia:Caf%C3%A9_dos_programadores&oldid=32271948#Ordena.C3.A7.C3.A3o_de_categorias

It seems they fixed the order by editing the text after the pipe of the category as in
https://pt.wikipedia.org/w/index.php?title=Linha_602_da_STCP&diff=32266548&oldid=32216579

Here are the differences between two requests to the API before and after the changes made:
https://pt.wikipedia.org/w/index.php?title=Wikip%C3%A9dia:P%C3%A1gina_de_testes/1&diff=32271872

I'm also attaching a copy of the list of recent changes to articles of that category in case any of them are relevant.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links