Last modified: 2014-04-03 10:10:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T65216, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 63216 - Only accept CAPTCHA responses with diacritics removed
Only accept CAPTCHA responses with diacritics removed
Status: NEW
Product: MediaWiki extensions
Classification: Unclassified
ConfirmEdit (CAPTCHA extension) (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
: i18n
Depends on: 5309
Blocks:
  Show dependency treegraph
 
Reported: 2014-03-28 08:36 UTC by Minh Nguyễn
Modified: 2014-04-03 10:10 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Minh Nguyễn 2014-03-28 08:36:54 UTC
With a fix for bug 5309, such as the one discussed at <https://gerrit.wikimedia.org/r/121255/>, it’s entirely possible that a user might get a CAPTCHA with illegible diacritics. Diacritics in Latin alphabets can look identical to one another when distorted, for example i í ì ỉ, or ó ơ.

For better usability, ConfirmEdit should display a CAPTCHA containing diacritics but require the user to enter the characters without diacritics. There’s a third-party module called Unidecode that does a decent job of accent folding.

One tradeoff would be that such CAPTCHAs might be easier for a bot to crack. There’s also the issue that a character like Ê might be considered a base letter in one language (as in Vietnamese) but a letter with a diacritic in another (Portuguese).
Comment 1 Nemo 2014-03-28 08:46:19 UTC
I'm not sure about the "only" part: for usability it's better if the system is completely agnostic to details, or I may correctly enter all diacritics and have my solution rejected for no reason.

When implementing this we're probably going to use some standard Unicode solution for case folding and diacritics/accent folding.
Comment 2 Nikola Smolenski 2014-03-29 04:27:51 UTC
Yes, this is absolutely necessary. Not only diacritics might not be visible, but also some users may not have the keyboard to enter them.

I am not sure how to implement the folding, and it may even be language-dependent. For example, users may enter 'ö' as 'o' or as 'oe', or 'đ' as 'đ', 'ð', 'd' or 'dj'. A possibility is to simply avoid words with diacritics, which should be possible for most languages.

In future, when non-Latin captchas are implemented, the same should apply to alphabets (f.e. it should be possible to enter a Cyrillic captcha in Latin alphabet).

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links