Last modified: 2011-12-12 22:46:51 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T20136, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 18136 - Allow AntiSpoof to ignore several characters in script mixture check


Summary:	Allow AntiSpoof to ignore several characters in script mixture check

Status:	NEW

Product:	MediaWiki extensions
Classification:	Unclassified
Component:	AntiSpoof (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Low enhancement with 1 vote (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2009-03-24 20:48 UTC by Victor Vasiliev
Modified:	2011-12-12 22:46 UTC (History)
CC List:	2 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Victor Vasiliev 2009-03-24 20:48:47 UTC

It would be useful to add a configuration variable or per-language variable which makes AntiSpoof to ignore several characters when doing script mixture check.

That would be very useful for Osetian projects. In spite of using a cyrillic script, Osetian language has letters "Æ" and "æ" used in it. Those letters have both cyrillic and latin versions, but latin one is supported by more software, so it most Osetian web sites use it rather than cyrillic æ.

Comment 1 Slavik IVANOV 2009-03-24 21:01:44 UTC

Among other possible cases, Roman numbers in non-Latin (Cyrillic) texts. Like in a possible name «Юзер VI» and similar.

The case, very similar to the Ossetic, can be met in many other Cyrillic languages of Russia, like I (Latin capital i) for graphically identical "palochka" character in many languages of Daghestan.

Comment 2 Van de Bugger 2011-12-12 22:29:42 UTC

> In spite of using a cyrillic script, Osetian language has letters "Æ" and "æ"
> used in it.

Hmm... As I see in source code:

> array( 0x0400, 0x052F, "SCRIPT_CYRILLIC" ), # Cyrillic, Cyrillic Supplement

This range includes "U+04D4 CYRILLIC CAPITAL LIGATURE A IE" and "U+04D5 CYRILLIC SMALL LIGATURE A IE", respectively "Ӕ" and "ӕ", so `AntiSpoof' should not be a problem for Ossetian users. Is it still a problem? Please confirm.

Уастырджыйи хорзӕх уӕ уӕд!

Comment 3 Van de Bugger 2011-12-12 22:46:51 UTC

> …like I (Latin capital i) for graphically identical
> "palochka" character in many languages of Daghestan.

Cyrillic range:

> array( 0x0400, 0x052F, "SCRIPT_CYRILLIC" ), # Cyrillic, Cyrillic Supplement

includes "U+0406 CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I", "U+0407 CYRILLIC CAPITAL LETTER YI", "U+04CF CYRILLIC SMALL LETTER PALOCHKA" and many other characters…

So I would recommend to push guys who work on keyboard layouts to provide end users with ability to enter all required symbols staying within Cyrillic Unicode block. It would be more correct in long term.

> Among other possible cases, Roman numbers in non-Latin (Cyrillic) texts. Like
> in a possible name «Юзер VI» and similar.

I also faced this. However, it should be a problem only for kings and popes… Cyrillic range contains many fanny symbols, smart users can utilize them, e. g.: "Юзер ѴІ" or even "Юзер ХХХІІІ". Less smart people can write "Юзер Шестой" and "Юзер Тридцать третий".

So I would recommend to close the bug.

BTW: This is my personal opinion, I am not `AntiSpoof' maintainer or developer.

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links