Last modified: 2011-12-12 22:46:51 UTC
It would be useful to add a configuration variable or per-language variable which makes AntiSpoof to ignore several characters when doing script mixture check. That would be very useful for Osetian projects. In spite of using a cyrillic script, Osetian language has letters "Æ" and "æ" used in it. Those letters have both cyrillic and latin versions, but latin one is supported by more software, so it most Osetian web sites use it rather than cyrillic æ.
Among other possible cases, Roman numbers in non-Latin (Cyrillic) texts. Like in a possible name «Юзер VI» and similar. The case, very similar to the Ossetic, can be met in many other Cyrillic languages of Russia, like I (Latin capital i) for graphically identical "palochka" character in many languages of Daghestan.
> In spite of using a cyrillic script, Osetian language has letters "Æ" and "æ" > used in it. Hmm... As I see in source code: > array( 0x0400, 0x052F, "SCRIPT_CYRILLIC" ), # Cyrillic, Cyrillic Supplement This range includes "U+04D4 CYRILLIC CAPITAL LIGATURE A IE" and "U+04D5 CYRILLIC SMALL LIGATURE A IE", respectively "Ӕ" and "ӕ", so `AntiSpoof' should not be a problem for Ossetian users. Is it still a problem? Please confirm. Уастырджыйи хорзӕх уӕ уӕд!
> …like I (Latin capital i) for graphically identical > "palochka" character in many languages of Daghestan. Cyrillic range: > array( 0x0400, 0x052F, "SCRIPT_CYRILLIC" ), # Cyrillic, Cyrillic Supplement includes "U+0406 CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I", "U+0407 CYRILLIC CAPITAL LETTER YI", "U+04CF CYRILLIC SMALL LETTER PALOCHKA" and many other characters… So I would recommend to push guys who work on keyboard layouts to provide end users with ability to enter all required symbols staying within Cyrillic Unicode block. It would be more correct in long term. > Among other possible cases, Roman numbers in non-Latin (Cyrillic) texts. Like > in a possible name «Юзер VI» and similar. I also faced this. However, it should be a problem only for kings and popes… Cyrillic range contains many fanny symbols, smart users can utilize them, e. g.: "Юзер ѴІ" or even "Юзер ХХХІІІ". Less smart people can write "Юзер Шестой" and "Юзер Тридцать третий". So I would recommend to close the bug. BTW: This is my personal opinion, I am not `AntiSpoof' maintainer or developer.