Last modified: 2014-11-15 04:00:27 UTC
Created attachment 12248 [details] Math default PNG mode Malayalam characters inside <math>..</math> leads to lexing error while using default server rendered PNG mode. Same TeX code with English characters work without any problem. Using MathJax partially fixes the problem but it is heavy and still cause some display errors. Please check both attachments.
Created attachment 12249 [details] MathJax Mode
I just found this affect various other languages including Hindi, Tamil, Arabic etc. All these languages has vast mathematical history, even before European languages catch up. Certainly not of low priority. Title change, priority increased.
Hi, can you provide some samples in the format. <math>\frac12</math> : fraction 1/2 Best Physikerwelt
Malayalam: <math>\frac{ക}{ച}</math> Tamil: <math>\frac{க}{த}</math> Hindi: <math>\frac{क}{च}</math> Simple forms like <math>ക</math> also fails!
Thank you. I rendered it with LaTeXML. For me it looks ok. See http://math-test.instance-proxy.wmflabs.org/wiki/Main_Page Can you confirm that?
Created attachment 12256 [details] LaTeXML Malayalam Letter's Breaking when rendering Complex words. (In reply to comment #5) > Thank you. > I rendered it with LaTeXML. > For me it looks ok. See > http://math-test.instance-proxy.wmflabs.org/wiki/Main_Page > Can you confirm that?
Can you provide a tex file that creates the correct output?
Please include all libraries and make sure that it can be compiled with pdflatex. Thanks a lot.
Are there any other softwares can write complex Malayalam?
This is not limited to Malayalam. I've updated testwiki page: http://math-test.instance-proxy.wmflabs.org/wiki/Main_Page . You can see Hindi and Arabic also has same problems. Arabic words (rtl) reversed (ltr) also. (In reply to comment #9) > Are there any other softwares can write complex Malayalam? All popular softwares and rendering engines supporting Malayalam (and most of other Indic and Asian languages) perfectly, though I don't knowe whether TeX based systems designed for Asian Languages exist. Eventhough one such library exists, it may take its own time to get some traction and gain popularity. If I could, I would fix the server rendered PNG issues first, beccause they are light and simple (to users) and client independent.
I think at least Arabic should be separate, since it's a RTL language, which presents specific issues. Hence, broken out to bug 48118. But I'm inclined to think all four should be separate bugs, since all these languages use different scripts. But I'll hold off for a little while in case people familiar with these scripts think they're likely to have the same solution.
That is not completely true. This happens because the script accepts and render each unicode code points independently. A probable perfect approach for Latin languages, but not for other languages such as Indic Languages and other Asian languages. This source is taken from the wiki page: <mi mathvariant="normal">ത</mi> <mi mathvariant="normal">്</mi> <mi mathvariant="normal">ത</mi> <mi mathvariant="normal">ി</mi> This should be displayed as a single character. But because of script's inability to to accept non-Latin rules, this is broken into four. Same happens to all languages including Arabic. And also bug is not limited to these four languages. At least all other Indian languages are affected.
By 'script', I was referring to the alphabet(s), not the software. You may be right that the only issues with the Indic languages are handling combining characters. But even if so, Arabic (besides the separate alphabet) is the only one with RTL issues.
I just don't know how much you can understand this because of your strong European perspective about languages :-). May be Arabic has some other issues, But issues mentioned here are so far highly related. Other RTL languages like Hebrew and Farsi also affected by this bug.
(In reply to comment #14) > I just don't know how much you can understand this because of your strong > European perspective about languages :-). I understand the gist of your report, which is why I still have not separated the Indic languages. Note that European languages also have combining characters and ligatures, and letters based on them. > Other RTL languages like Hebrew and Farsi also affected by this bug. If they all fail to handle RTL with the same symptoms (I have not tested), please edit bug 48118 accordingly. I meant the only RTL language that had been reported so far.
However, a minimal example in pdflatex would be really helpful. So that developers know how it should like, can reproduce it with LaTeX and identify the problems.
(In reply to comment #15) > If they all fail to handle RTL with the same symptoms (I have not tested), > please edit bug 48118 accordingly. I think, They are failing because math extension choose code points independently and place them in between different tags. So the rendering mechanism takes these characters as different entities. This is just another effect of this bug. Please check source code of testwiki (Comment 5) page. I did find many European language ligatures, but they are atomic and so the don't suffer this problem.
This is duplicate to 798. For texvc, you will need to fix latex to include all the charsets etc. For MathJax, you need to use text mode and for the more complex situations wait for SVG or MathML to finally come of age and start truly working in the majority of browsers. For MathJax HTML mode, you can perhaps see if there are ways to instruct it to NOT split certain unicode ranges into separate symbols. But that should really be taken upstream.
For the MathJax part: https://github.com/mathjax/MathJax/issues/474
Bug 798 is both broader in some ways (all writing systems) and narrower in others (only texvc), so I'm just listing it as a see also for now.
The MathJax part can be partially fixed for \text{} blocks by applying https://gerrit.wikimedia.org/r/#/c/61924/
I really don't understand why this should be high priority now. There is no information how it should look like. See also http://trac.mathweb.org/LaTeXML/ticket/1737
(In reply to comment #22) > I really don't understand why this should be high priority now. > There is no information how it should look like. Well, the bug summary is rather tranchant and is worth this priority/severity; the screenshots I looked at seemed to confirm it (the highlighted characters definitely look mangled), but I'm not confident commenting such languages and if I'm wrong please revert me and clarify the summary.
(In reply to comment #22) > I really don't understand why this should be high priority now. May I know why this should be of low priority while all non-latin languages suffering this? > There is no information how it should look like. just remove math tag from ligatures (like ഗ്ദ്ധ്രീ ) and check. Or try to use some Arabic or Hebrew.
Does that look right: http://math-test2.instance-proxy.wmflabs.org/w/index.php?title=Hindi,_Arabic,_Malayalam,_Tamil For me ... using Firefox... it seems to look right
Created attachment 13403 [details] Problem Screenshot All I got is Errors. Since login is not possible, It is also not possible to set MathJax default.
Oh. I'm right updating the database on that server. http://demo.formulasearchengine.com/index.php/User_talk:Admin
Created attachment 13405 [details] current rendering (MathML (left), MathJax (right)
Created attachment 13406 [details] wikisource screenshot Yes that is perfect. But some letters like ത്ര still not working in wikisource. But I just found that it is possible to avoid this by adding extra {}, like: {ത്ര}^3. And also default PNG mode is still not working.
Sure. I'm just developing those features. What does ത്ര mean? If it's a word with a conventional meaning instead of a variable it should be written as \text{ത്ര}^3 This will help blind people in the future to understand the content. For example the first equation on https://en.wikipedia.org/w/index.php?title=Body_mass_index is written as <math>\mathrm{BMI}</math> | <math>= \frac{\text{mass}(\text{kg})}{\left(\text{height}(\text{m})\right)^2}</math> This prevents wrong spacing between the letters BMI and mass and makes the equation accessible for disabled people and search engines.
ത്ര has no meaning. It is not a word. It is just a letter (ligature).
Unfortunately the TeX system has not very good support of non ASCII characters. It's the same problem with the German 'umlauts' öäü. In all math environments that I tried I had to use \text{ü} instead of ü. So I think we have to cope with that as well. Maybe we can come up with an interface for easy editing with visual editor one day. MathJaX can convert the samples if \text is used to MathML but not to SVG. see http://math-test2.instance-proxy.wmflabs.org/wiki/30463 This requires Math2.0 which is ready for code review at https://gerrit.wikimedia.org/r/#/c/85801/
*** Bug 30463 has been marked as a duplicate of this bug. ***
> MathJaX can convert the samples if \text is used to MathML but not to SVG. > see > http://math-test2.instance-proxy.wmflabs.org/wiki/30463 That shouldn't happen. MathJax accepts unicode input and should fall back to system fonts for characters not covered by its fonts. I suspect it's a problem with svgtex / mathoid or more precisely with phantomjs. Could you file a bug report at MathJax with some background information?
Hi Peter, what do you mean exactly with at MathJaX?
(In reply to comment #35) > Hi Peter, > what do you mean exactly with at MathJaX? Hi Moritz, With "at MathJax" I meant our bug tracker at https://github.com/mathjax/mathjax/issues. But I just took another look and this might be yet another problem with embedding the SVG. That is, the "complex Malayalam equation" at http://math-test2.instance-proxy.wmflabs.org/wiki/30463 looks very different if viewed independently at http://math-test2.instance-proxy.wmflabs.org/w/index.php?title=Special:MathShowImage&hash=b3793eaa2110f756756386bb17b17d04.
*** Bug 2458 has been marked as a duplicate of this bug. ***
Is anyone around here that is familiar with OCaml? In order to make process on all the Math related bugs this change https://gerrit.wikimedia.org/r/#/c/90748 need so get a review. I verified the correctness of the code, and wrote a small guide how you can verify it by yourself at http://www.formulasearchengine.com/Verify%20texvc%20light . If you don't like to do a review I'd be interested in the reasons for that. Even a very basic feedback concerning the commit helps. i.e. Do you understand what the change is about, or should the commit message be changed.
I unassigned me, since I don't want to review my own code and I can not do more about it at the moment.
The patches seem to have been merged or abandoned, so setting back to NEW.
See https://www.mediawiki.org/wiki/Extension:Math/bug/48032 in MathML rendering mode. I would say that's as good as it gets. If that's not sufficent please seperate individual bugs for the chars that don't work so that we can fix one after the oter. We would also need a reference implementation (for example a LaTeX document) that demonstrates how the result should look like.
Created attachment 16810 [details] 48032 in MathML I can't see what's wrong here.
Created attachment 16875 [details] PNG rendering gives Lexing Error
Created attachment 16876 [details] No proper rendering in MathML
Created attachment 16877 [details] No proper rendering in clientside MathJax Bug still persists and no proper rendering available in any mode. Test page - [[:mlഉപയോക്താവ്:Praveenp/പരിശോധന]]. There are various books with mathematical equations still not possible to include (eg: https://ml.wikisource.org/wiki/%E0%B4%A4%E0%B4%BE%E0%B5%BE:Yukthibhasa.djvu/227)
I still do not understand the problem I updated the demo at https://www.mediawiki.org/wiki/Extension:Math/bug/48032 I'd be happy if someone could explain it to me.
Here's my explanation: * PNG output is totally broken and needs to be fixed to stop spewing red error messages. I think the problem here is quite obvious. * MathML output renders very poorly. Characters are overlapping and/or severely truncated (e.g. only half the character is visible). * MathJax seems acceptable, even though the characters are very small. Having said that, I do not read Malayalam so cannot confirm whether the rendering is indeed perfectly accurate. Hopefully Praveen can also explain some more.
@physikerwelt could PNG be generated from the SVG? MathJax-node has hooks for Apache Batik to do this. MathJax is Unicode friendly although it will often have to rely on system fonts to provide the glyphs.
(In reply to This, that and the other (TTO) from comment #47) > Here's my explanation: > > * PNG output is totally broken and needs to be fixed to stop spewing red > error messages. I think the problem here is quite obvious. OK. This will be solved by Bug 72240. > * MathML output renders very poorly. Characters are overlapping and/or > severely truncated (e.g. only half the character is visible). Here, we need to clarify if the generated MathML is correct. > * MathJax seems acceptable, even though the characters are very small. > Having said that, I do not read Malayalam so cannot confirm whether the > rendering is indeed perfectly accurate. > > Hopefully Praveen can also explain some more. The problem I see with this bugs and many other bugs, is that there is a gap between the bug report created by the users and the information needed by the programmers. For example this bug describes a whole complex of problems. So there is no well defined problem that can be transformed into a testcase and then be resolved. To my understanding a programmer compatible bug report would optimally be the following: 1) What is the scenario: a) TeX input code b) Environment variables c) Which rendering mode is used? d) Which browser is used? 2) Is the input valid? 3) If it's valid: How is it supposed to look? Is there a TeX\LaTeX document that demonstrates the rendering? What should be the MathML output.
(In reply to physikerwelt from comment #49) > The problem I see with this bugs and many other bugs, is that there is a gap > between the bug report created by the users and the information needed by > the programmers. Sometimes that takes some back-and-forth in order to figure it out. I agree that "No proper rendering" should be made clearer (e.g. "bottom of character is truncated", "characters are not joined properly", or "wrong font is used", whatever the case may be). Also, it helps to put the full markup (<math> through </math>) here in the bug report. If it's only in the screenshot, someone has to retype it or figure out where to copy it from, which is hard if it's in a language they don't know). > So there is no well defined problem that can be transformed into a testcase > and then be resolved. The PNG one seems pretty clear. I've made a sub-task for this, bug 73285.
Created attachment 17099 [details] Different renderings of Malayalam equations Here's another try. Praveen, can you confirm that the renderings at the left of this image are indeed correct? First example is: <math>\cfrac{\text{ച}}{\angle 4\text{ത്ര}^3}</math> MS Word code (linear form) is: ച/(∠4〖ത്ര〗^3 ) Second example is: <math>ല \left ( \frac{വ്വ(ക്ലു \pm കു_ശ)}{ടി} \right )</math> MS Word code (linear form) is: ല(വ്വ(ക്ലു±〖കു〗_ശ )/ടി) Per comment 7, I tried using sharelatex to generate an example, but its pdfLaTeX renderer refuses to accept Malayalam at all, and the LuaLaTex and XeLaTeX renderers do it wrongly.
If you manage to make Mathjax render your input correctly, it will be easy to port that to the Math extension.
This will generally work (so 69702 should resolve this). But I think for this specific case there's a bug in MathJax related to combining characters. I've filed https://github.com/mathjax/MathJax/issues/952
(In reply to physikerwelt from comment #52) > If you manage to make Mathjax render your input correctly, it will be easy > to port that to the Math extension. Equation 1 is working properly in MathJax but not in the SVG fallback, so that can be a starting point.
comment #51 is correct