Last modified: 2012-12-10 20:52:21 UTC
Take a good look at this diff from the Russian Wikipedia: http://ru.wikipedia.org/w/index.php?title=%D0%91%D0%B8%D1%82%D0%B2%D0%B0_%D0%B7%D0%B0_%D0%9A%D0%B0%D0%B2%D0%BA%D0%B0%D0%B7_(1942%E2%80%941943)&diff=9212389&oldid=9112075 What was fixed is four instances of the Unicode character FDD3. I stumbled upon it when i ran a Perl script that analyzed a dump of the Russian Wikipedia. I ran several pattern matches on every page and on this page the Perl regular expression engine issued this warning: "Unicode character is illegal" (see http://perldoc.perl.org/perldiag.html ). The code chart in which this character appears indeed says this: "These codes are intended for process-internal uses, but are not permitted for interchange." (Search for FDD3 here: http://www.unicode.org/charts/About.html ) My Unicode expertise ends here. I don't know what exactly are those illegal characters. I can guess that characters that have the Noncharacter_Code_Point property are illegal, and maybe there are more. I also don't know what is the exact damage that these characters cause if saved in the MediaWiki database, but i can guess that it may cause interoperability troubles with external tools - browsers, bots, search engines, future versions of the database engine etc. It may also cause security breaches. So i suppose that there is a warning sign here and most probably it shouldn't be possible to save pages that include such characters.
They're not technically illegal, but perhaps should be excluded as they wouldn't be useful.