Last modified: 2014-11-15 04:00:27 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 48032 - Math extension doesn't support many languages including Malayalam, Hindi, and Tamil
Math extension doesn't support many languages including Malayalam, Hindi, and...
Status: REOPENED
Product: MediaWiki extensions
Classification: Unclassified
Math (Other open bugs)
unspecified
All All
: High major with 4 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://ml.wikisource.org/wiki/%E0%B4%...
: i18n
: 30463 (view as bug list)
Depends on: 73285
Blocks: 32578 41348 48118 56295 57770
  Show dependency treegraph
 
Reported: 2013-05-03 02:14 UTC by praveenp
Modified: 2014-11-15 04:00 UTC (History)
16 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Math default PNG mode (129.32 KB, image/png)
2013-05-03 02:14 UTC, praveenp
Details
MathJax Mode (15.43 KB, image/png)
2013-05-03 02:15 UTC, praveenp
Details
LaTeXML Malayalam (8.43 KB, image/png)
2013-05-04 06:06 UTC, manojkmohan
Details
Problem Screenshot (188.03 KB, image/png)
2013-09-30 03:43 UTC, praveenp
Details
current rendering (MathML (left), MathJax (right) (164.87 KB, image/png)
2013-09-30 03:56 UTC, physikerwelt
Details
wikisource screenshot (34.44 KB, image/png)
2013-09-30 04:04 UTC, praveenp
Details
48032 in MathML (64.00 KB, image/png)
2014-10-19 19:34 UTC, physikerwelt
Details
PNG rendering gives Lexing Error (31.26 KB, image/png)
2014-10-24 01:56 UTC, praveenp
Details
No proper rendering in MathML (24.31 KB, image/png)
2014-10-24 01:58 UTC, praveenp
Details
No proper rendering in clientside MathJax (21.34 KB, image/png)
2014-10-24 02:03 UTC, praveenp
Details
Different renderings of Malayalam equations (43.73 KB, image/png)
2014-11-12 05:25 UTC, This, that and the other (TTO)
Details

Description praveenp 2013-05-03 02:14:29 UTC
Created attachment 12248 [details]
Math default PNG mode

Malayalam characters inside <math>..</math> leads to lexing error‎ while using default server rendered PNG mode. Same TeX code with English characters work without any problem. 

Using MathJax partially fixes the problem but it is heavy and still cause some display errors.

Please check both attachments.
Comment 1 praveenp 2013-05-03 02:15:13 UTC
Created attachment 12249 [details]
MathJax Mode
Comment 2 praveenp 2013-05-03 14:55:12 UTC
I just found this affect various other languages including Hindi, Tamil, Arabic etc. All these languages has vast mathematical history, even before European languages catch up. Certainly not of low priority. Title change, priority increased.
Comment 3 physikerwelt 2013-05-03 18:40:40 UTC
Hi,

can you provide some samples in the format.

<math>\frac12</math> : fraction 1/2

Best Physikerwelt
Comment 4 praveenp 2013-05-04 02:31:43 UTC
Malayalam: <math>\frac{ക}{ച}</math>
Tamil: <math>\frac{க}{த}</math>
Hindi: <math>\frac{क}{च}</math>

Simple forms like <math>ക</math> also fails!
Comment 5 physikerwelt 2013-05-04 05:47:48 UTC
Thank you.
I rendered it with LaTeXML.
For me it looks ok. See http://math-test.instance-proxy.wmflabs.org/wiki/Main_Page
Can you confirm that?
Comment 6 manojkmohan 2013-05-04 06:06:35 UTC
Created attachment 12256 [details]
LaTeXML Malayalam

Letter's Breaking when rendering Complex words. 

(In reply to comment #5)
> Thank you.
> I rendered it with LaTeXML.
> For me it looks ok. See
> http://math-test.instance-proxy.wmflabs.org/wiki/Main_Page
> Can you confirm that?
Comment 7 physikerwelt 2013-05-04 19:21:44 UTC
Can you provide a tex file that creates the correct output?
Comment 8 physikerwelt 2013-05-04 19:26:33 UTC
Please include all libraries and make sure that it can be compiled with pdflatex. Thanks a lot.
Comment 9 Jiabao Wu 2013-05-05 13:41:24 UTC
Are there any other softwares can write complex Malayalam?
Comment 10 praveenp 2013-05-06 02:43:53 UTC
This is not limited to Malayalam. I've updated testwiki page: http://math-test.instance-proxy.wmflabs.org/wiki/Main_Page . You can see Hindi and Arabic also has same problems. Arabic words (rtl) reversed (ltr) also. 

(In reply to comment #9)
> Are there any other softwares can write complex Malayalam?

All popular softwares and rendering engines supporting Malayalam (and most of other Indic and Asian languages) perfectly, though I don't knowe whether TeX based systems designed for Asian Languages exist. Eventhough one such library exists, it may take its own time to get some traction and gain popularity. 

If I could, I would fix the server rendered PNG issues first, beccause they are light and simple (to users) and client independent.
Comment 11 Matthew Flaschen 2013-05-06 03:18:13 UTC
I think at least Arabic should be separate, since it's a RTL language, which presents specific issues.  Hence, broken out to bug 48118.

But I'm inclined to think all four should be separate bugs, since all these languages use different scripts.  But I'll hold off for a little while in case people familiar with these scripts think they're likely to have the same solution.
Comment 12 praveenp 2013-05-06 03:46:45 UTC
That is not completely true. This happens because the script accepts and render each unicode code points independently. A probable perfect approach for Latin languages, but not for other languages such as Indic Languages and other Asian languages.

This source is taken from the wiki page:

<mi mathvariant="normal">ത</mi>       <mi mathvariant="normal">്</mi>       <mi mathvariant="normal">ത</mi>       <mi mathvariant="normal">ി</mi>

This should be displayed as a single character. But because of script's inability to to accept non-Latin rules, this is broken into four. Same happens to all languages including Arabic.

And also bug is not limited to these four languages. At least all other Indian languages are affected.
Comment 13 Matthew Flaschen 2013-05-06 04:01:56 UTC
By 'script', I was referring to the alphabet(s), not the software.

You may be right that the only issues with the Indic languages are handling combining characters.  But even if so, Arabic (besides the separate alphabet) is the only one with RTL issues.
Comment 14 praveenp 2013-05-06 04:15:03 UTC
I just don't know how much you can understand this because of your strong European perspective about languages :-). May be Arabic has some other issues, But issues mentioned here are so far highly related. Other RTL languages like Hebrew and Farsi also affected by this bug.
Comment 15 Matthew Flaschen 2013-05-06 04:28:15 UTC
(In reply to comment #14)
> I just don't know how much you can understand this because of your strong
> European perspective about languages :-). 

I understand the gist of your report, which is why I still have not separated the Indic languages.  Note that European languages also have combining characters and ligatures, and letters based on them.

> Other RTL languages like Hebrew and Farsi also affected by this bug.

If they all fail to handle RTL with the same symptoms (I have not tested), please edit bug 48118 accordingly.  I meant the only RTL language that had been reported so far.
Comment 16 physikerwelt 2013-05-06 11:01:08 UTC
However, a minimal example in pdflatex would be really helpful. So that developers know how it should like, can reproduce it with LaTeX and identify the problems.
Comment 17 praveenp 2013-05-06 17:22:27 UTC
(In reply to comment #15)

> If they all fail to handle RTL with the same symptoms (I have not tested),
> please edit bug 48118 accordingly.

I think, They are failing because math extension choose code points independently and place them in between different tags. So the rendering mechanism takes these characters as different entities. This is just another effect of this bug. Please check source code of testwiki (Comment 5) page.

I did find many European language ligatures, but they are atomic and so the don't suffer this problem.
Comment 18 Derk-Jan Hartman 2013-05-19 18:36:30 UTC
This is duplicate to 798.

For texvc, you will need to fix latex to include all the charsets etc. For MathJax, you need to use text mode and for the more complex situations wait for SVG or MathML to finally come of age and start truly working in the majority of browsers. For MathJax HTML mode, you can perhaps see if there are ways to instruct it to NOT split certain unicode ranges into separate symbols. But that should really be taken upstream.
Comment 19 Derk-Jan Hartman 2013-05-19 19:10:49 UTC
For the MathJax part: https://github.com/mathjax/MathJax/issues/474
Comment 20 Matthew Flaschen 2013-05-19 19:16:37 UTC
Bug 798 is both broader in some ways (all writing systems) and narrower in others (only texvc), so I'm just listing it as a see also for now.
Comment 21 Derk-Jan Hartman 2013-05-22 20:29:22 UTC
The MathJax part can be partially fixed for \text{} blocks by applying https://gerrit.wikimedia.org/r/#/c/61924/
Comment 22 physikerwelt 2013-09-17 22:57:49 UTC
I really don't understand why this should be high priority now. 
There is no information how it should look like.
See also
http://trac.mathweb.org/LaTeXML/ticket/1737
Comment 23 Nemo 2013-09-17 23:28:35 UTC
(In reply to comment #22)
> I really don't understand why this should be high priority now. 
> There is no information how it should look like.

Well, the bug summary is rather tranchant and is worth this priority/severity; the screenshots I looked at seemed to confirm it (the highlighted characters definitely look mangled), but I'm not confident commenting such languages and if I'm wrong please revert me and clarify the summary.
Comment 24 praveenp 2013-09-18 03:58:53 UTC
(In reply to comment #22)
> I really don't understand why this should be high priority now. 

May I know why this should be of low priority while all non-latin languages suffering this?

> There is no information how it should look like.

just remove math tag from ligatures (like ഗ്ദ്ധ്രീ )  and check. Or try to use some Arabic or Hebrew.
Comment 25 physikerwelt 2013-09-18 15:27:09 UTC
Does that look right:

http://math-test2.instance-proxy.wmflabs.org/w/index.php?title=Hindi,_Arabic,_Malayalam,_Tamil

For me ... using Firefox... it seems to look right
Comment 26 praveenp 2013-09-30 03:43:17 UTC
Created attachment 13403 [details]
Problem Screenshot

All I got is Errors. Since login is not possible, It is also not possible to set MathJax default.
Comment 27 physikerwelt 2013-09-30 03:48:47 UTC
Oh. I'm right updating the database on that server.
http://demo.formulasearchengine.com/index.php/User_talk:Admin
Comment 28 physikerwelt 2013-09-30 03:56:00 UTC
Created attachment 13405 [details]
current rendering (MathML (left), MathJax (right)
Comment 29 praveenp 2013-09-30 04:04:09 UTC
Created attachment 13406 [details]
wikisource screenshot

Yes that is perfect. But some letters like ത്ര still not working in wikisource. But I just found that it is possible to avoid this by adding extra {}, like: {ത്ര}^3.

And also default PNG mode is still not working.
Comment 30 physikerwelt 2013-09-30 04:14:19 UTC
Sure. I'm just developing those features. 
What does ത്ര mean?
If it's a word with a conventional meaning instead of a variable it should be written as \text{ത്ര}^3
This will help blind people in the future to understand the content.

For example the first equation on 
https://en.wikipedia.org/w/index.php?title=Body_mass_index
is written as
<math>\mathrm{BMI}</math>&nbsp;
| <math>= \frac{\text{mass}(\text{kg})}{\left(\text{height}(\text{m})\right)^2}</math>
This prevents wrong spacing between the letters BMI and mass and makes the equation accessible for disabled people and search engines.
Comment 31 praveenp 2013-10-02 10:07:55 UTC
ത്ര has no meaning. It is not a word. It is just a letter (ligature).
Comment 32 physikerwelt 2013-10-16 10:34:03 UTC
Unfortunately the TeX system has not very good support of non ASCII characters.
It's the same problem with the German 'umlauts' öäü. In all math environments that I tried I had to use \text{ü} instead of ü.
So I think we have to cope with that as well. Maybe we can come up with an interface for easy editing with visual editor one day.
MathJaX can convert the samples if \text is used to MathML but not to SVG.
see 
http://math-test2.instance-proxy.wmflabs.org/wiki/30463

This requires Math2.0 which is ready for code review at
https://gerrit.wikimedia.org/r/#/c/85801/
Comment 33 physikerwelt 2013-10-16 10:36:45 UTC
*** Bug 30463 has been marked as a duplicate of this bug. ***
Comment 34 Peter Krautzberger 2013-10-16 19:11:48 UTC
> MathJaX can convert the samples if \text is used to MathML but not to SVG.
> see 
> http://math-test2.instance-proxy.wmflabs.org/wiki/30463

That shouldn't happen. MathJax accepts unicode input and should fall back to system fonts for characters not covered by its fonts. I suspect it's a problem with svgtex / mathoid or more precisely with phantomjs. 

Could you file a bug report at MathJax with some background information?
Comment 35 physikerwelt 2013-10-19 14:07:52 UTC
Hi Peter,
what do you mean exactly with at MathJaX?
Comment 36 Peter Krautzberger 2013-10-20 17:45:29 UTC
(In reply to comment #35)
> Hi Peter,
> what do you mean exactly with at MathJaX?

Hi Moritz,

With "at MathJax" I meant our bug tracker at https://github.com/mathjax/mathjax/issues.

But I just took another look and this might be yet another problem with embedding the SVG. That is, the "complex Malayalam equation" at http://math-test2.instance-proxy.wmflabs.org/wiki/30463 looks very different if viewed independently at http://math-test2.instance-proxy.wmflabs.org/w/index.php?title=Special:MathShowImage&hash=b3793eaa2110f756756386bb17b17d04.
Comment 37 physikerwelt 2013-10-22 19:46:06 UTC
*** Bug 2458 has been marked as a duplicate of this bug. ***
Comment 38 physikerwelt 2013-11-06 17:43:54 UTC
Is anyone around here that is familiar with OCaml?
In order to make process on all the Math related bugs this change
https://gerrit.wikimedia.org/r/#/c/90748
need so get a review.
I verified the correctness of the code, and wrote a small guide how you can verify it by yourself at
http://www.formulasearchengine.com/Verify%20texvc%20light
.
If you don't like to do a review I'd be interested in the reasons for that.
Even a very basic feedback concerning the commit helps. i.e. Do you understand what the change is about, or should the commit message be changed.
Comment 39 physikerwelt 2013-11-06 17:46:03 UTC
I unassigned me, since I don't want to review my own code and I can not do more about it at the moment.
Comment 40 This, that and the other (TTO) 2014-07-05 02:23:17 UTC
The patches seem to have been merged or abandoned, so setting back to NEW.
Comment 41 physikerwelt 2014-10-19 19:33:49 UTC
See
https://www.mediawiki.org/wiki/Extension:Math/bug/48032
in MathML rendering mode.
I would say that's as good as it gets.
If that's not sufficent please seperate individual bugs for the chars that don't work so that we can fix one after the oter. We would also need a reference implementation (for example a LaTeX document) that demonstrates how the result should look like.
Comment 42 physikerwelt 2014-10-19 19:34:46 UTC
Created attachment 16810 [details]
48032 in MathML

I can't see what's wrong here.
Comment 43 praveenp 2014-10-24 01:56:11 UTC
Created attachment 16875 [details]
PNG rendering gives Lexing Error
Comment 44 praveenp 2014-10-24 01:58:11 UTC
Created attachment 16876 [details]
No proper rendering in MathML
Comment 45 praveenp 2014-10-24 02:03:10 UTC
Created attachment 16877 [details]
No proper rendering in clientside MathJax

Bug still persists and no proper rendering available in any mode. Test page - [[:mlഉപയോക്താവ്:Praveenp/പരിശോധന]].

There are various books with mathematical equations still not possible to include (eg: https://ml.wikisource.org/wiki/%E0%B4%A4%E0%B4%BE%E0%B5%BE:Yukthibhasa.djvu/227)
Comment 46 physikerwelt 2014-10-24 05:32:57 UTC
I still do not understand the problem
I updated the demo at https://www.mediawiki.org/wiki/Extension:Math/bug/48032

I'd be happy if someone could explain it to me.
Comment 47 This, that and the other (TTO) 2014-10-24 06:42:07 UTC
Here's my explanation:

* PNG output is totally broken and needs to be fixed to stop spewing red error messages. I think the problem here is quite obvious.
* MathML output renders very poorly. Characters are overlapping and/or severely truncated (e.g. only half the character is visible).
* MathJax seems acceptable, even though the characters are very small. Having said that, I do not read Malayalam so cannot confirm whether the rendering is indeed perfectly accurate.

Hopefully Praveen can also explain some more.
Comment 48 Peter Krautzberger 2014-10-24 07:24:23 UTC
@physikerwelt could PNG be generated from the SVG? MathJax-node has hooks for Apache Batik to do this. MathJax is Unicode friendly although it will often have to rely on system fonts to provide the glyphs.
Comment 49 physikerwelt 2014-10-24 08:09:20 UTC
(In reply to This, that and the other (TTO) from comment #47)
> Here's my explanation:
> 
> * PNG output is totally broken and needs to be fixed to stop spewing red
> error messages. I think the problem here is quite obvious.
OK. This will be solved by Bug 72240.
> * MathML output renders very poorly. Characters are overlapping and/or
> severely truncated (e.g. only half the character is visible).
Here, we need to clarify if the generated MathML is correct.
> * MathJax seems acceptable, even though the characters are very small.
> Having said that, I do not read Malayalam so cannot confirm whether the
> rendering is indeed perfectly accurate.
> 
> Hopefully Praveen can also explain some more.

The problem I see with this bugs and many other bugs, is that there is a gap between the bug report created by the users and the information needed by the programmers. For example this bug describes a whole complex of problems.
So there is no well defined problem that can be transformed into a testcase and then be resolved.

To my understanding a programmer compatible bug report would optimally be the following:
1) What is the scenario:
a) TeX input code
b) Environment variables
c) Which rendering mode is used?
d) Which browser is used?
2) Is the input valid?
3) If it's valid: How is it supposed to look?
Is there a TeX\LaTeX document that demonstrates the rendering?
What should be the MathML output.
Comment 50 Matthew Flaschen 2014-11-11 23:30:19 UTC
(In reply to physikerwelt from comment #49)
> The problem I see with this bugs and many other bugs, is that there is a gap
> between the bug report created by the users and the information needed by
> the programmers.

Sometimes that takes some back-and-forth in order to figure it out.  I agree that "No proper rendering" should be made clearer (e.g. "bottom of character is truncated", "characters are not joined properly", or "wrong font is used", whatever the case may be).

Also, it helps to put the full markup (<math> through </math>) here in the bug report.  If it's only in the screenshot, someone has to retype it or figure out where to copy it from, which is hard if it's in a language they don't know).

> So there is no well defined problem that can be transformed into a testcase
> and then be resolved.

The PNG one seems pretty clear.  I've made a sub-task for this, bug 73285.
Comment 51 This, that and the other (TTO) 2014-11-12 05:25:40 UTC
Created attachment 17099 [details]
Different renderings of Malayalam equations

Here's another try. Praveen, can you confirm that the renderings at the left of this image are indeed correct?

First example is: <math>\cfrac{\text{ച}}{\angle 4\text{ത്ര}^3}</math>
MS Word code (linear form) is: ച/(∠4〖ത്ര〗^3 )

Second example is: <math>ല \left ( \frac{വ്വ(ക്ലു \pm കു_ശ)}{ടി} \right )</math>
MS Word code (linear form) is: ല(വ്വ(ക്ലു±〖കു〗_ശ )/ടി)

Per comment 7, I tried using sharelatex to generate an example, but its  pdfLaTeX renderer refuses to accept Malayalam at all, and the LuaLaTex and XeLaTeX renderers do it wrongly.
Comment 52 physikerwelt 2014-11-12 08:28:47 UTC
If you manage to make Mathjax render your input correctly, it will be easy to port that to the Math extension.
Comment 53 Peter Krautzberger 2014-11-12 09:13:20 UTC
This will generally work (so 69702 should resolve this).

But I think for this specific case there's a bug in MathJax related to combining characters. I've filed https://github.com/mathjax/MathJax/issues/952
Comment 54 This, that and the other (TTO) 2014-11-12 09:15:10 UTC
(In reply to physikerwelt from comment #52)
> If you manage to make Mathjax render your input correctly, it will be easy
> to port that to the Math extension.

Equation 1 is working properly in MathJax but not in the SVG fallback, so that can be a starting point.
Comment 55 praveenp 2014-11-15 04:00:27 UTC
comment #51 is correct

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links