Last modified: 2014-11-17 09:57:14 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T32996, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 30996 - Change $wgCategoryCollation values to appropriate one for each Wikimedia wiki (tracking)
Change $wgCategoryCollation values to appropriate one for each Wikimedia wiki...
Status: REOPENED
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
unspecified
All All
: Normal enhancement with 3 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
: tracking
Depends on: 28397 44667 collations 29915 35632 43185 46081 48097
Blocks: tracking
  Show dependency treegraph
 
Reported: 2011-09-19 15:54 UTC by Brion Vibber
Modified: 2014-11-17 09:57 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Brion Vibber 2011-09-19 15:54:47 UTC
According to Roan in bug 30287 comment 9, actually enabling the uca-default collation stuff that was "fixed" for bug 164 is waiting on an Ubuntu upgrade on the apache cluster (bug 29915?).

There are a few bugs which it looks like should be resolved (for Categories at least) by enabling this -- eg bug 30287 (Farsi sorting problems); others require further work (bug 29788 needs a Swedish-specific collation setting).
Comment 1 Sam Reed (reedy) 2011-09-23 00:08:23 UTC
Closing LATER until apaches are all upgraded
Comment 2 Brion Vibber 2011-09-23 00:37:16 UTC
Relevant dependencies as RT tickets:

http://rt.wikimedia.org/Ticket/Display.html?id=22 full update to Lucid (bug 29915)

http://rt.wikimedia.org/Ticket/Display.html?id=652 install icu & php5-intl (depends on the above)
Comment 3 Peter Youngmeisterarius 2011-10-14 23:03:15 UTC
RT #22 and 652 are done. this can probably be closed.
Comment 4 Bawolff (Brian Wolff) 2011-10-15 20:11:23 UTC
(In reply to comment #3)
> RT #22 and 652 are done. this can probably be closed.

Well this still needs someone to make the changes to MediaWiki's config file and run the maintenance script.
Comment 5 Tim Starling 2011-11-16 01:21:17 UTC
The first letter identification code (maintenance/language/generateCollationData.php) won't work for all languages, so some wikis will have their category pages broken terribly by this change. Also, the default collation tables sort a lot of languages incorrectly, and the amount of breakage that causes will depend on the language in question. So I recommend doing this change on a language-by-language basis, after checking each language for correct collation and first-letter behaviour on a test wiki. 

Also, it would be nice to know in advance what percentage of sort keys will be larger than the 230 bytes allowed by the database field, and if that percentage is significant, whether there are categories on the target wikis where the order will be changed by truncation after 230 bytes.
Comment 6 Helder 2012-03-20 18:01:59 UTC
Any progress on this?

On Portuguese Wikipedia we still need to use
{{DEFAULTSORT: Page Name without accents }}
on any article whose title has an accent if we want it to be sorted appropriately in the categories. E.g.:
https://pt.wikipedia.org/w/index.php?title=%C3%81gua_Boa&oldid=28441112&action=edit

Maybe adding a note to [[mw:Roadmap]] would be appropriated?
Comment 7 Liangent 2012-10-13 12:30:20 UTC
Some related info:

I created some collations for Chinese and is expected to be used on zhwiki. This code requires ICU 4.8+ to run. Current php5-intl in WMF's APT repo uses libicu42 and existing wikis with uca-default (ptwiki) have sort keys generated with libicu42. Once libicu is updated all existing uca-default sort keys need to be rebuilt.
Comment 8 Bawolff (Brian Wolff) 2013-01-11 00:29:21 UTC
Btw meta, and especially commons may be good next targets for deploying uca-default to. Both are multilingual so using the root coallation seems ideal
Comment 9 Sam Reed (reedy) 2013-02-02 23:54:55 UTC
'wgCategoryCollation' => array(
	'default' => 'uppercase',
	'ptwiki' => 'uca-default', # bug 35632
	'iswiktionary' => 'identity', # bug 30722
),


I'm presuming this is fixed now...
Comment 10 Bawolff (Brian Wolff) 2013-02-03 00:15:16 UTC
(In reply to comment #9)
> 'wgCategoryCollation' => array(
>     'default' => 'uppercase',
>     'ptwiki' => 'uca-default', # bug 35632
>     'iswiktionary' => 'identity', # bug 30722
> ),
> 
> 
> I'm presuming this is fixed now...

Umm only for ptwiki.
Comment 11 Bawolff (Brian Wolff) 2013-02-03 00:55:48 UTC
Just to clarify this bug-we probably should *not* do this for all wikis. As tim said above, more mw code is needed to make it work properly.

However this can (and should imo) be done on all english, portugese, and multilingual (meta and commons) wikis
Comment 12 Sam Reed (reedy) 2013-02-03 17:12:44 UTC
I guess, a rough list for this would be:

reedy@fenari:/home/wikipedia/common$ grep enw all.dblist
arbcom_enwiki
enwiki
enwikibooks
enwikinews
enwikiquote
enwikisource
enwikiversity
enwikivoyage
enwiktionary
tenwiki
wg_enwiki
reedy@fenari:/home/wikipedia/common$ grep ptw all.dblist
ptwiki
ptwikibooks
ptwikinews
ptwikiquote
ptwikisource
ptwikiversity
ptwikivoyage
ptwiktionary

+brwikimedia

reedy@fenari:/home/wikipedia/common$ cat special.dblist
advisorywiki
arbcom_dewiki
arbcom_enwiki
arbcom_fiwiki
arbcom_nlwiki
auditcomwiki
boardgovcomwiki
boardwiki
chairwiki
chapcomwiki
checkuserwiki
collabwiki
commonswiki
donatewiki
execwiki
fdcwiki
foundationwiki
grantswiki
incubatorwiki
internalwiki
mediawikiwiki
metawiki
movementroleswiki
nostalgiawiki
officewiki
otrs_wikiwiki
outreachwiki
qualitywiki
searchcomwiki
sourceswiki
spcomwiki
specieswiki
stewardwiki
strategywiki
tenwiki
test2wiki
testwiki
usabilitywiki
wg_enwiki
wikimania2005wiki
wikimania2006wiki
wikimania2007wiki
wikimania2008wiki
wikimania2009wiki
wikimania2010wiki
wikimania2011wiki
wikimania2012wiki
wikimania2013wiki
wikimaniateamwiki
wikidatawiki
Comment 13 Sam Reed (reedy) 2013-02-03 17:14:29 UTC
Do the rest of the is projects want to become identity too?

reedy@fenari:/home/wikipedia/common$ grep isw all.dblist
iswiki
iswikibooks
iswikiquote
iswikisource
iswiktionary
Comment 14 Bawolff (Brian Wolff) 2013-02-03 18:11:14 UTC
(In reply to comment #13)
> Do the rest of the is projects want to become identity too?
> 
> reedy@fenari:/home/wikipedia/common$ grep isw all.dblist
> iswiki
> iswikibooks
> iswikiquote
> iswikisource
> iswiktionary

I would imagine so. The language is case sensitive from what I understand. I guess we should ask.

-----

Realistically it doesnt matter that much for a wiki like wikimania2006 since nobody is using them. Although it certainly wouldn't hurt anything.

For larger wikis (where it would take more than a couple hours to run the script) we would probably want to talk to the local community as categories will behave somewhat weirdly when the script is running. ( pages will be out of order while the script is running) its too bad the script doesnt go in order of cl_to instead of cl_from as that would minimize disruption somewhat.
Comment 15 Bartosz Dziewoński 2013-02-26 20:42:47 UTC
Adjusting the summary: "Set $wgCategoryCollation to 'uca-default' and rebuild category sort keys on Wikimedia wikis deployment" -> "Change $wgCategoryCollation values to appropriate one for each Wikimedia wiki".

Per bug 45443, we don't really want uca-default anywhere anymore (apart from multi-language projects like Commons or Meta), but language-specific collations.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links