Last modified: 2007-02-18 00:45:50 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T11011, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 9011 - Arabic and western text on the same line causes incorrect interleaving


Summary:	Arabic and western text on the same line causes incorrect interleaving

Status:	RESOLVED INVALID

Product:	MediaWiki
Classification:	Unclassified
Component:	Parser (Other open bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	Normal normal (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:	http://en.wikipedia.org/wiki/Ruhollah...
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2007-02-17 08:48 UTC by Joona Palaste
Modified:	2007-02-18 00:45 UTC (History)
CC List:	0 users

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
A detail screenshot of the rendered article text illustrating the problem. (45.05 KB, image/png) 2007-02-17 08:51 UTC, Joona Palaste	Details
Add an attachment (proposed patch, testcase, etc.)

Description Joona Palaste 2007-02-17 08:48:02 UTC

In the English Wikipedia article on Ruhollah Khomeini, having his name in the
native Arabic script (right-to-left) inside the normal English (left-to-right)
text of the article causes incorrect interleaving.
I am using Mozilla Firefox 2.0 on Fedora Core 5 Linux, and on my browser the
first two lines of the text look something like this:

"Grand Ayatollah Seyyed Ruhollah Mosavi Khomeini (listen (Persian pronunciation)
(help·info)) (Persian: [Arabic text]
[Arabic text] Rūḥollāh Mūsavī Khomeynī Arabic: 17) ([Arabic text] May 1900¹ - 3
June 1989) was a..."

I've placed "[Arabic text]" where it displays Arabic text so that this bug
report itself does not depend on the settings of the browser but illustrates the
issue as I see it. The problem is plainly visible: It is supposed to say that
Khomeini was born on 17 May 1900, but part of his Arabic name appears between
the day "17" and the month "May 1900". When checking the wiki markup source
code, everything looks OK, the Arabic text is correctly interleaved with the
western text.

Is this a bug with MediaWiki or with my browser?

Comment 1 Joona Palaste 2007-02-17 08:51:20 UTC

Created attachment 3238 [details]
A detail screenshot of the rendered article text illustrating the problem.

Comment 2 Jesse (Pathoschild) 2007-02-17 09:00:57 UTC

This can be fixed by surrounding the text with &rlm; and &lrm; (for example,
&rlm;Rūḥollāh Mūsavī Khomeynī&lrm;). I suppose this could be templated as
{{rtl|Rūḥollāh Mūsavī Khomeynī}}, if that template doesn't already exist.

See the previous bug 8996 about similar behaviour on special pages. I think a
serverside fix would be applicable to all instances of the direction override
problem.

*** This bug has been marked as a duplicate of 8996 ***

Comment 3 Rotem Liss 2007-02-17 09:36:24 UTC

Bug 8996 is about a completely different problem, talking about a different kind
of direction marks. It's not a duplicate.

Comment 4 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-02-18 00:45:50 UTC

It's impossible to get correct directionality information from plain Unicode
text.  Consider:

The Hebrew letter "aleph" is א, ב is "bet".

Note that aleph is א, bet is ב, and the logical order (as I typed it and as it
was encoded) has the א before the ב.  The comma and space fall between two RTL
characters, so they're treated as RTL embedded in LTR.  But semantically, the
comma is part of the LTR phrase (delimiting two LTR phrases, which happen to end
or begin with RTL characters) and should be treated as LTR text.

But consider this, which is syntactically identical:

Exodus 1:2 reads, in the original Hebrew: "ראובן, שמעון, לוי, ויהודה".

Here the behavior is correct, because in this context, the commas delimit RTL
phrases (or words), not LTR phrases.  But there's no possible way either
MediaWiki or the browser could know that.  The Unicode directionality algorithm
tries to do the impossible, and consequently fails.  The only way to avoid this
problem is to add semantic information on how you want the directionality to go,
using Unicode directionality marks:

The Hebrew letter "aleph" is א‎, ב is "bet".

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links