Last modified: 2012-01-22 18:57:04 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T6012, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 4012 - Add flexible magic character conversion to the user interface


Summary:	Add flexible magic character conversion to the user interface

Status:	RESOLVED WONTFIX

Product:	MediaWiki
Classification:	Unclassified
Component:	Internationalization (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Lowest enhancement with 3 votes (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:	http://test.leuksman.com/edit/User:Br...
Whiteboard:
Keywords:

Depends on:
Blocks:	3985 13466
	Show dependency tree / graph

Reported:	2005-11-17 15:37 UTC by lɛʁi לערי ריינהארט
Modified:	2012-01-22 18:57 UTC (History)
CC List:	2 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description lɛʁi לערי ריינהארט 2005-11-17 15:37:22 UTC

Hallo!

To my knowledge the build in editor can handle "magical character conversion"
which is activated for wiki's with content language Esperanto.

This magical character conversion works as follows. The characters Ĉ, Ĝ, Ĥ, Ĵ,
Ŝ, Ŭ, ĉ, ĝ, ĥ, ĵ, ŝ, ŭ are stored in the database but are displayed as Cx, Gx,
Hx, Jx, Sx, Ux, cx, gx, hx, jx, sx, ux in the brwoser. See
[[eo:Vikipedio:Bugzilla_1512#Notes]].
*note* :eo: has also an "escape notation" as displaying Cxx for a stord Cxx, CxX
for a stord CxX etc.

*Requirement of this bug report:*

It should be possible to set up the character conversion
- character by character
and / or
- by *include* range
and / or
- by *exclude* range

Examples:
If you want to distinguish between "minus" = "-" and &ndash; &#8211; &#x2013; = "–"
see: Unicode Character EN DASH - U 2013
- http://www.fileformat.info/info/unicode/char/2013/index.htm

There should be a syntax that
"–" should be shown as &ndash; in the editor but saved as "-".

There should be a syntax to match "existing" magical character conversion in
Esperanto. If such syntax, configuration would be available users could activate
it as their choice in their monobook definition.

This feature would allow to detect BiDi punctuation characters as mentioned in
bug 3819: strip phantom general punctuation characters from page titles

This feature would help to distinguish different kinds of whitespace as "space"
"tab" as mentioned in
bug 3894: white space characters, BiDi control characters should show up in diff

This feature would allow to edit InterLangua links using the magic character
conversion as desired by Scot in bug 3615 comment 1:
bug 3615: blocks of code not handling magic character conversions in Esperanto
correcty - reason for page deletion
 
*notes*
a) The best way to introduce a feature is to offer it optional. This will not
brake code or iritate users.
b) Because of "escape syntax" at some point it should be decided if this feature
would brake documentaion pages with examples. If there all versions &ndash;
&#8211; &#x2013; = "–" are used these should not be changed while the page is
edited and saved again.
c) no details about the required syntax are specified here in order to avoid
limitations

The requested felexible magic character conversion build in the editor will
offer solutions / workarounds / would be helpfull also for other reported bugs:

One could see in the source of a page
- if Unicode whitespaces is used in article title
bug 1414: Unicode whitespaces allowed in article title
- one could distinguish characters which look the same in different alphabets
but are coded differently
bug 1524: usernames should use unicode whitelist
bug 2290: user impersonation using homographs
bug 3885: title normalisation

This feature would "normalise" the way how characters are saved. This would /
should make search more efficient.

"copy and past" can be platform and program dependend. I have seen many broken
pages where cyrillic characters in InterLanguage links where saved as ????. This
could be avioded if a whole range would be displayed only in &#nnnn; notation in
the editor and saved back as real unicode.

The same applies detecting homoglyphs / homographs. The reports are mentioned above.

regards reinhardt [[user:gangleri]]

P.S. This request is only about implementing the basic feature. Please open
individual bugs for subfeatures wherever necessary.

Comment 1 lɛʁi לערי ריינהארט 2005-11-17 22:25:38 UTC

with special setups this feature could provide a limited replacement to the
discontinued special page ~makeutf8~

Comment 2 lɛʁi לערי ריינהארט 2005-12-10 02:15:06 UTC

(In reply to comment #0)
> This feature would allow to detect BiDi punctuation characters as mentioned in
> bug 3819: strip phantom general punctuation characters from page titles

testcase
http://test.leuksman.com/index.php?title=User_talk:Gangleri&oldid=10505&action=edit&section=6

I mentioned this testcase in order to ilustrate how editing of a page could turn
out. There are scenarios I do not like to discuss here in public which would
allow to "acieve this effect" many revisions after the eroneous / malicious
change. This would make it very difficult to trace and correct the error later.
Please e-mail me if you like to know more details.

Comment 3 lɛʁi לערי ריינהארט 2005-12-12 23:57:30 UTC

(In reply to comment #0)
Expanding the request because %nn&nn&nn is another method to encode characters.
see %C3%9C at
http://de.wikipedia.org/w/index.php?title=Benutzer:VanGore&action=edit&section=7 
[[wikibooks:de:%c3%9Cber_das_Wesen_der_Information]] is an alternative way to encode
[[wikibooks:de:Über_das_Wesen_der_Information]]

> Examples:
> If you want to distinguish between "minus" = "-" and &ndash; &#8211; &#x2013;
= "–"
> see: Unicode Character EN DASH - U 2013
> - http://www.fileformat.info/info/unicode/char/2013/index.htm

If you want to distinguish between "minus" = "-" and &ndash; &#8211; &#x2013;
%E2%80%93 and %e2%80%93 = "–"

> *notes*
> b) Because of "escape syntax" at some point it should be decided if this feature
> would brake documentaion pages with examples. If there all versions &ndash;
> &#8211; &#x2013; = "–" are used these should not be changed while the page is
> edited and saved again.

b) ... If there all versions &ndash; &#8211; &#x2013; %E2%80%93 and %e2%80%93 =
"–" are used these should not be changed while the page is edited and saved again.

> This feature would "normalise" the way how characters are saved. This would /
> should make search more efficient.

add ... It also makes it easier to read and edit / correct text encoded as
%nn%nn/nn .

Comment 4 Omegatron 2005-12-15 16:34:47 UTC

Here's my version, in response to Bug 2676:

This is great, but Unicode-unaware *browsers* aren't the only problem. A lot of
people want to work in Unicode-unaware text editors as well, and this makes it
difficult for them. They'd have to fake out the server into thinking they had an
old browser or something in order to see the HTML entity version of the source.
I have a different proposal:

1. Convert all HTML entities (named or Unicode numbers or whatever) into plain
Unicode characters in the wikisource.

2. Provide an option in the editing interface to view the source in either
"plain Unicode" format (with actual characters) or "plain text" format (with HTML
entities) on a per-edit basis.

2.a. When editing in "plain text" mode, all the bad characters (non-ASCII?) will
be converted into named HTML entities if possible (&mdash; and the like), or
into numbered HTML entities if not possible (&#8212; and the like).

2.b. The default editing format will be selectable in preferences.

Comment 5 lɛʁi לערי ריינהארט 2005-12-18 16:17:47 UTC

*note*
This bug was opened with the summary:
"Add flexible magic character conversion to the built-in editor"

After reading the response from Omegatron in comment 4 and searching for bug
dependencies and duplicates I wonder if the "magic character conversion" should
be limited to "the build-in editor" or should be available for other functions
as well.

changing summary to
"Add flexible magic character conversion to the user interface"
this should cover also the "built-in editor"

adding dependency
blocks: bug 3894: white space characters, BiDi control characters should show up
in diff
having a duplicate: bug 3672: BiDi: improuve the diffs with regard to RTL issues

A feature as requested here would prevent users opening invalid bugs as
(invalid bug 3621: BiDi: RTL list not rendered correctly)

best regards reinhardt [[user:gangleri]]

Comment 6 Niklas Laxström 2009-02-28 17:06:27 UTC

With the experience of Esperanto magic character conversion, I'm pretty sure we don't want to add any more of that.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links