Last modified: 2014-09-23 20:01:53 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T27619, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 25619 - Add more characters to ccnorm
Add more characters to ccnorm
Status: PATCH_TO_REVIEW
Product: MediaWiki extensions
Classification: Unclassified
AntiSpoof (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Ryan Kaldari
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-10-22 20:48 UTC by Helder
Modified: 2014-09-23 20:01 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Proposed patch (295.30 KB, patch)
2010-11-07 16:30 UTC, EdoDodo
Details

Description Helder 2010-10-22 20:48:56 UTC
Currently only some characters are normalized to a "canonical" form. For example, although ccnorm("α") results in "A", ccnorm("ά") doesn't change anything.

The function should support the conversion of more characters.

The following list is based on what is currently available at [[MediaWiki:Titleblacklist]], but maybe it is better to have different sets of characters depending on the case of the letter. For example, ⅅ for D, but ⅆ for d.

----
a: aαąăãàāάạậảấầẩắằẵẳẫặḁǟǡȁᾳὰᾀἁᾁἄᾄἂᾂἆᾆἅᾅἃᾃἇᾇáâäæåǻ٩4
b: bßβбв฿
c: cċĉ¢сćĉçč
d: dďḍðⅆ
e: éèëeęěĕėẻẹếềễểȨȩḝēḗȅȇệḙḛ3عڠẽə
f: fғ₣
g: gĝģğġɠǥǧǵḡԌ
h: hήĥħȞʰʱḣḥḧḩḫнңӈӉηἠἡἢἣἤἥἦἧὴᾐћⱧԋњһ
i: iìíîïĩļǐīĭḷŀιїɨ!łľį
k: kķкќқҝҡҟӄ
l: l₤ĺľḷłŀλлљ
m: mɯḿṁṃмӍμ₥
n: n₦ńñņňṇν
o: oóòôöõǒōŏǫőœøəόοωὸὀὁὄὂὅὃоөӧӫδσʘǿọ
p: pƥṕṗǷ₧þρр
q: qɊʠ
r: rŕŗřȑȓƦʳʴʵʶṙṛṝṟя®
s: s$śŝşšṣσѕ
t: tţťṭτтŧ
u: uúùûüũůǔūǖǘǚǜŭųű
w: wŵẁẃẅẇẉ₩
x: xҳχ
y: yýÿŷƴȲʸẏỳỵỷỹʊύυϋὑὓὕὗὺῠῡуϓ
z: zźžż
----
Comment 1 EdoDodo 2010-11-07 16:30:17 UTC
Created attachment 7800 [details]
Proposed patch

I've attached a proposed patch that would add the characters to the AntiSpoof checks (which are also used by the AbuseFilter).
Comment 2 Gurch 2010-11-20 15:45:51 UTC
Changed extension to AntiSpoof, since that's where the change would have to be made (unless AbuseFilter was fixed by an independent re-implementation of the normalization, which seems pointless).
Comment 3 Platonides 2010-11-20 16:03:29 UTC
Are they the same you added[1] in [2]?
I synchronized the svn version from the list at mediawiki.org at r76484

1- http://www.mediawiki.org/w/index.php?title=Extension%3AAntiSpoof%2FEquivalence_sets%2Fequivset_1&action=historysubmit&diff=361648&oldid=251667 
2- http://www.mediawiki.org/wiki/Extension:AntiSpoof/Equivalence_sets
Comment 4 EdoDodo 2010-11-24 17:49:12 UTC
Yes, they're the same ones that I added to mediawiki.org in the edits you linked.
Comment 5 Platonides 2010-11-24 17:52:30 UTC
They were committed in r76484, then.
Comment 6 EdoDodo 2010-11-30 18:32:32 UTC
Okay, thanks.
Comment 7 Helder 2011-03-12 18:31:12 UTC
The function still doesn't works with all characters mentioned in comment 0 above.

Using ccnorm in the string "ìíîïĩļǐīĭḷĿї!ľį₤ĺľḷĿΛЛљóòôöõǒōŏǫőόὸὀὁὄὂὅὃọ$śŝşšṣσ" doesn't change any of its characters.
Comment 8 Sumana Harihareswara 2012-05-16 19:24:11 UTC
EdoDodo, does your patch still apply?

I recommend that you get a developer access account https://www.mediawiki.org/wiki/Developer_access so that you can commit your patches directly into the source control system in the future -- in fact, you could update and submit this patch, and get it reviewed faster.  I'm sorry for the delay.
Comment 9 Chad H. 2012-05-16 19:27:15 UTC
(In reply to comment #8)
> EdoDodo, does your patch still apply?
> 

This was already applied, see comment 5.
Comment 10 Helder 2013-10-07 12:41:02 UTC
(In reply to comment #7)
> The function still doesn't works with all characters mentioned in comment 0
> above.
> 
> Using ccnorm in the string
> "ìíîïĩļǐīĭḷĿї!ľį₤ĺľḷĿΛЛљóòôöõǒōŏǫőόὸὀὁὄὂὅὃọ$śŝşšṣσ"
> doesn't change any of its characters.

Still reproducible.
Comment 11 Ryan Kaldari 2013-10-26 19:45:07 UTC
It looks like all of the equivalents were added except for the ones corresponding to the letters I, L, O, and S. Of course this makes sense since those 4 letters have never worked in AntiSpoof due to bug 27987.

I fixed bug 27987 in change I613f9917, so I'll do a follow-up commit to add the missing equivs.
Comment 12 Gerrit Notification Bot 2013-10-26 20:31:49 UTC
Change 92154 had a related patch set uploaded by Kaldari:
Adding missing equivalents for I, L, O, and S.

https://gerrit.wikimedia.org/r/92154
Comment 13 Ryan Kaldari 2013-10-26 20:35:26 UTC
I added all the missing equivalencies, except for 4 or 5 that either didn't make sense or would have conflicted with valid equivalencies for Greek. For example:
λ->L
л->L
љ->L
σ->S
Comment 14 Gerrit Notification Bot 2013-11-23 15:31:14 UTC
Change 97304 had a related patch set uploaded by Kaldari:
Adding 2 new equivalencies (partial fix for bug 25619)

https://gerrit.wikimedia.org/r/97304
Comment 15 Ryan Kaldari 2013-12-05 01:05:34 UTC
Since I haven't had any luck getting code review on https://gerrit.wikimedia.org/r/92154 I submitted https://gerrit.wikimedia.org/r/97304 as a simpler version. It only adds ! and $ and nothing else.
Comment 16 Quim Gil 2014-07-25 11:55:15 UTC
Both patches are still open. The first one got some reviews and now it looks like is waiting for a new upload from Kaldari. The second one with the simpler version got no reviews at all.
Comment 17 Quim Gil 2014-08-22 16:18:04 UTC
(In reply to Ryan Kaldari from comment #15)
> Since I haven't had any luck getting code review on
> https://gerrit.wikimedia.org/r/92154 I submitted
> https://gerrit.wikimedia.org/r/97304 as a simpler version. It only adds !
> and $ and nothing else.

I'm not sure whether sending a request to wikitech-l could help getting any reviews to these two patches, but pinging at the patches and here doesn't seem to be enough... Any ideas?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links