Last modified: 2014-04-03 10:10:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T65216, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 63216 - Only accept CAPTCHA responses with diacritics removed


Summary:	Only accept CAPTCHA responses with diacritics removed

Status:	NEW

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	ConfirmEdit (CAPTCHA extension) (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal enhancement (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:	i18n

Depends on:	5309
Blocks:
	Show dependency tree / graph

Reported:	2014-03-28 08:36 UTC by Minh Nguyễn
Modified:	2014-04-03 10:10 UTC (History)
CC List:	2 users (show)

See Also:	63217 https://github.com/mitsuhiko/babel/issues/89
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Minh Nguyễn 2014-03-28 08:36:54 UTC

With a fix for bug 5309, such as the one discussed at <https://gerrit.wikimedia.org/r/121255/>, it’s entirely possible that a user might get a CAPTCHA with illegible diacritics. Diacritics in Latin alphabets can look identical to one another when distorted, for example i í ì ỉ, or ó ơ.

For better usability, ConfirmEdit should display a CAPTCHA containing diacritics but require the user to enter the characters without diacritics. There’s a third-party module called Unidecode that does a decent job of accent folding.

One tradeoff would be that such CAPTCHAs might be easier for a bot to crack. There’s also the issue that a character like Ê might be considered a base letter in one language (as in Vietnamese) but a letter with a diacritic in another (Portuguese).

Comment 1 Nemo 2014-03-28 08:46:19 UTC

I'm not sure about the "only" part: for usability it's better if the system is completely agnostic to details, or I may correctly enter all diacritics and have my solution rejected for no reason.

When implementing this we're probably going to use some standard Unicode solution for case folding and diacritics/accent folding.

Comment 2 Nikola Smolenski 2014-03-29 04:27:51 UTC

Yes, this is absolutely necessary. Not only diacritics might not be visible, but also some users may not have the keyboard to enter them.

I am not sure how to implement the folding, and it may even be language-dependent. For example, users may enter 'ö' as 'o' or as 'oe', or 'đ' as 'đ', 'ð', 'd' or 'dj'. A possibility is to simply avoid words with diacritics, which should be possible for most languages.

In future, when non-Latin captchas are implemented, the same should apply to alphabets (f.e. it should be possible to enter a Cyrillic captcha in Latin alphabet).

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links