Last modified: 2012-01-22 18:57:04 UTC
Hallo! To my knowledge the build in editor can handle "magical character conversion" which is activated for wiki's with content language Esperanto. This magical character conversion works as follows. The characters Ĉ, Ĝ, Ĥ, Ĵ, Ŝ, Ŭ, ĉ, ĝ, ĥ, ĵ, ŝ, ŭ are stored in the database but are displayed as Cx, Gx, Hx, Jx, Sx, Ux, cx, gx, hx, jx, sx, ux in the brwoser. See [[eo:Vikipedio:Bugzilla_1512#Notes]]. *note* :eo: has also an "escape notation" as displaying Cxx for a stord Cxx, CxX for a stord CxX etc. *Requirement of this bug report:* It should be possible to set up the character conversion - character by character and / or - by *include* range and / or - by *exclude* range Examples: If you want to distinguish between "minus" = "-" and – – – = "–" see: Unicode Character EN DASH - U 2013 - http://www.fileformat.info/info/unicode/char/2013/index.htm There should be a syntax that "–" should be shown as – in the editor but saved as "-". There should be a syntax to match "existing" magical character conversion in Esperanto. If such syntax, configuration would be available users could activate it as their choice in their monobook definition. This feature would allow to detect BiDi punctuation characters as mentioned in bug 3819: strip phantom general punctuation characters from page titles This feature would help to distinguish different kinds of whitespace as "space" "tab" as mentioned in bug 3894: white space characters, BiDi control characters should show up in diff This feature would allow to edit InterLangua links using the magic character conversion as desired by Scot in bug 3615 comment 1: bug 3615: blocks of code not handling magic character conversions in Esperanto correcty - reason for page deletion *notes* a) The best way to introduce a feature is to offer it optional. This will not brake code or iritate users. b) Because of "escape syntax" at some point it should be decided if this feature would brake documentaion pages with examples. If there all versions – – – = "–" are used these should not be changed while the page is edited and saved again. c) no details about the required syntax are specified here in order to avoid limitations The requested felexible magic character conversion build in the editor will offer solutions / workarounds / would be helpfull also for other reported bugs: One could see in the source of a page - if Unicode whitespaces is used in article title bug 1414: Unicode whitespaces allowed in article title - one could distinguish characters which look the same in different alphabets but are coded differently bug 1524: usernames should use unicode whitelist bug 2290: user impersonation using homographs bug 3885: title normalisation This feature would "normalise" the way how characters are saved. This would / should make search more efficient. "copy and past" can be platform and program dependend. I have seen many broken pages where cyrillic characters in InterLanguage links where saved as ????. This could be avioded if a whole range would be displayed only in &#nnnn; notation in the editor and saved back as real unicode. The same applies detecting homoglyphs / homographs. The reports are mentioned above. regards reinhardt [[user:gangleri]] P.S. This request is only about implementing the basic feature. Please open individual bugs for subfeatures wherever necessary.
with special setups this feature could provide a limited replacement to the discontinued special page ~makeutf8~
(In reply to comment #0) > This feature would allow to detect BiDi punctuation characters as mentioned in > bug 3819: strip phantom general punctuation characters from page titles testcase http://test.leuksman.com/index.php?title=User_talk:Gangleri&oldid=10505&action=edit§ion=6 I mentioned this testcase in order to ilustrate how editing of a page could turn out. There are scenarios I do not like to discuss here in public which would allow to "acieve this effect" many revisions after the eroneous / malicious change. This would make it very difficult to trace and correct the error later. Please e-mail me if you like to know more details.
(In reply to comment #0) Expanding the request because %nn&nn&nn is another method to encode characters. see %C3%9C at http://de.wikipedia.org/w/index.php?title=Benutzer:VanGore&action=edit§ion=7 [[wikibooks:de:%c3%9Cber_das_Wesen_der_Information]] is an alternative way to encode [[wikibooks:de:Über_das_Wesen_der_Information]] > Examples: > If you want to distinguish between "minus" = "-" and – – – = "–" > see: Unicode Character EN DASH - U 2013 > - http://www.fileformat.info/info/unicode/char/2013/index.htm If you want to distinguish between "minus" = "-" and – – – %E2%80%93 and %e2%80%93 = "–" > *notes* > b) Because of "escape syntax" at some point it should be decided if this feature > would brake documentaion pages with examples. If there all versions – > – – = "–" are used these should not be changed while the page is > edited and saved again. b) ... If there all versions – – – %E2%80%93 and %e2%80%93 = "–" are used these should not be changed while the page is edited and saved again. > This feature would "normalise" the way how characters are saved. This would / > should make search more efficient. add ... It also makes it easier to read and edit / correct text encoded as %nn%nn/nn .
Here's my version, in response to Bug 2676: This is great, but Unicode-unaware *browsers* aren't the only problem. A lot of people want to work in Unicode-unaware text editors as well, and this makes it difficult for them. They'd have to fake out the server into thinking they had an old browser or something in order to see the HTML entity version of the source. I have a different proposal: 1. Convert all HTML entities (named or Unicode numbers or whatever) into plain Unicode characters in the wikisource. 2. Provide an option in the editing interface to view the source in either "plain Unicode" format (with actual characters) or "plain text" format (with HTML entities) on a per-edit basis. 2.a. When editing in "plain text" mode, all the bad characters (non-ASCII?) will be converted into named HTML entities if possible (— and the like), or into numbered HTML entities if not possible (— and the like). 2.b. The default editing format will be selectable in preferences.
*note* This bug was opened with the summary: "Add flexible magic character conversion to the built-in editor" After reading the response from Omegatron in comment 4 and searching for bug dependencies and duplicates I wonder if the "magic character conversion" should be limited to "the build-in editor" or should be available for other functions as well. changing summary to "Add flexible magic character conversion to the user interface" this should cover also the "built-in editor" adding dependency blocks: bug 3894: white space characters, BiDi control characters should show up in diff having a duplicate: bug 3672: BiDi: improuve the diffs with regard to RTL issues A feature as requested here would prevent users opening invalid bugs as (invalid bug 3621: BiDi: RTL list not rendered correctly) best regards reinhardt [[user:gangleri]]
With the experience of Esperanto magic character conversion, I'm pretty sure we don't want to add any more of that.