Last modified: 2007-02-18 00:45:50 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 9011 - Arabic and western text on the same line causes incorrect interleaving
Arabic and western text on the same line causes incorrect interleaving
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
PC Linux
: Normal normal (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
  Show dependency treegraph
Reported: 2007-02-17 08:48 UTC by Joona Palaste
Modified: 2007-02-18 00:45 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---

A detail screenshot of the rendered article text illustrating the problem. (45.05 KB, image/png)
2007-02-17 08:51 UTC, Joona Palaste

Description Joona Palaste 2007-02-17 08:48:02 UTC
In the English Wikipedia article on Ruhollah Khomeini, having his name in the
native Arabic script (right-to-left) inside the normal English (left-to-right)
text of the article causes incorrect interleaving.
I am using Mozilla Firefox 2.0 on Fedora Core 5 Linux, and on my browser the
first two lines of the text look something like this:

"Grand Ayatollah Seyyed Ruhollah Mosavi Khomeini (listen (Persian pronunciation)
(help·info)) (Persian: [Arabic text]
[Arabic text] Rūḥollāh Mūsavī Khomeynī Arabic: 17) ([Arabic text] May 1900¹ - 3
June 1989) was a..."

I've placed "[Arabic text]" where it displays Arabic text so that this bug
report itself does not depend on the settings of the browser but illustrates the
issue as I see it. The problem is plainly visible: It is supposed to say that
Khomeini was born on 17 May 1900, but part of his Arabic name appears between
the day "17" and the month "May 1900". When checking the wiki markup source
code, everything looks OK, the Arabic text is correctly interleaved with the
western text.

Is this a bug with MediaWiki or with my browser?
Comment 1 Joona Palaste 2007-02-17 08:51:20 UTC
Created attachment 3238 [details]
A detail screenshot of the rendered article text illustrating the problem.
Comment 2 Jesse (Pathoschild) 2007-02-17 09:00:57 UTC
This can be fixed by surrounding the text with ‏ and ‎ (for example,
‏Rūḥollāh Mūsavī Khomeynī‎). I suppose this could be templated as
{{rtl|Rūḥollāh Mūsavī Khomeynī}}, if that template doesn't already exist.

See the previous bug 8996 about similar behaviour on special pages. I think a
serverside fix would be applicable to all instances of the direction override

*** This bug has been marked as a duplicate of 8996 ***
Comment 3 Rotem Liss 2007-02-17 09:36:24 UTC
Bug 8996 is about a completely different problem, talking about a different kind
of direction marks. It's not a duplicate.
Comment 4 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-02-18 00:45:50 UTC
It's impossible to get correct directionality information from plain Unicode
text.  Consider:

The Hebrew letter "aleph" is א, ב is "bet".

Note that aleph is א, bet is ב, and the logical order (as I typed it and as it
was encoded) has the א before the ב.  The comma and space fall between two RTL
characters, so they're treated as RTL embedded in LTR.  But semantically, the
comma is part of the LTR phrase (delimiting two LTR phrases, which happen to end
or begin with RTL characters) and should be treated as LTR text.

But consider this, which is syntactically identical:

Exodus 1:2 reads, in the original Hebrew: "ראובן, שמעון, לוי, ויהודה".

Here the behavior is correct, because in this context, the commas delimit RTL
phrases (or words), not LTR phrases.  But there's no possible way either
MediaWiki or the browser could know that.  The Unicode directionality algorithm
tries to do the impossible, and consequently fails.  The only way to avoid this
problem is to add semantic information on how you want the directionality to go,
using Unicode directionality marks:

The Hebrew letter "aleph" is א‎, ב is "bet".

Note You need to log in before you can comment on or make changes to this bug.