Last modified: 2014-11-17 09:46:56 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T65217, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 63217 - Consider using Unicode/CLDR data instead of custom tables
Consider using Unicode/CLDR data instead of custom tables
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
AntiSpoof (Other open bugs)
master
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: i18n
Depends on:
Blocks: 63242
  Show dependency treegraph
 
Reported: 2014-03-28 08:56 UTC by Nemo
Modified: 2014-11-17 09:46 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Nemo 2014-03-28 08:56:46 UTC
Our "list is based on one by Neil Harris, which was derived by unknown methods".
At some point it will get easier to rely on CLDR, probably via the cldr MediaWiki extension.

Documents:
* http://www.unicode.org/reports/tr36/#visual_spoofing
* http://www.unicode.org/reports/tr39/#Confusable_Detection

Toy:
* http://unicode.org/cldr/utility/confusables.jsp

Data:
* http://www.unicode.org/Public/security/revision-03/confusablesSummary.txt
Comment 1 Nemo 2014-03-28 15:51:53 UTC
(In reply to Nemo from comment #0)
> * http://www.unicode.org/reports/tr39/#Confusable_Detection

The author of which mentioned to me a certain ICU API... https://ssl.icu-project.org/apiref/icu4c/uspoof_8h.html#details
Comment 2 Nemo 2014-03-28 22:26:17 UTC
I've added bug 63242 as dependency: it seems that the standard ICU API can easily solve a concrete problem (in AbuseFilter) that has been intractable for years.

Perhaps the old and new data sources can co-exist for a while, with the new ones being used first for user-invisible parts like the username creation and for new functions/interfaces like what proposed in bug 63242. When we're confident enough about the data quality (possibly after feeding CLDR with some of our own), and/or old interfaces are used less, we'll consider dropping the custom data sources.
Comment 3 Nemo 2014-04-01 16:17:47 UTC
(In reply to Nemo from comment #1)
> (In reply to Nemo from comment #0)
> > * http://www.unicode.org/reports/tr39/#Confusable_Detection
> 
> The author of which mentioned to me a certain ICU API...
> https://ssl.icu-project.org/apiref/icu4c/uspoof_8h.html#details

That is, http://www.php.net/manual/en/class.spoofchecker.php
Comment 4 Liangent 2014-04-01 17:15:02 UTC
(In reply to Nemo from comment #0)
> Our "list is based on one by Neil Harris, which was derived by unknown
> methods".
> At some point it will get easier to rely on CLDR, probably via the cldr
> MediaWiki extension.
> 
> Documents:
> * http://www.unicode.org/reports/tr36/#visual_spoofing
> * http://www.unicode.org/reports/tr39/#Confusable_Detection
> 
> Toy:
> * http://unicode.org/cldr/utility/confusables.jsp
> 
> Data:
> * http://www.unicode.org/Public/security/revision-03/confusablesSummary.txt

It doesn't contains zh-hans / zh-hant pairs which are contained in current AntiSpoof equivsets.
Comment 5 Nemo 2014-04-01 17:22:26 UTC
(In reply to Liangent from comment #4)
> It doesn't contains zh-hans / zh-hant pairs which are contained in current
> AntiSpoof equivsets.

Can you file a CLDR bug then please?
Comment 6 Liangent 2014-04-01 17:41:52 UTC
(In reply to Nemo from comment #5)
> Can you file a CLDR bug then please?

I can't find the CLDR bug tracker for confusable data...?

Forgot to say -- there're also [[Variant Chinese character]]s, which create more confusion than simple traditional / simplified Chinese differences.
Comment 7 Liangent 2014-04-01 17:53:43 UTC
http://unicode.org/cldr/trac/ticket/7189

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links