# Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 48032 - Math extension doesn't support many languages including Malayalam, Hindi, and Tamil
 Summary: Math extension doesn't support many languages including Malayalam, Hindi, and...
 Status: Product: REOPENED MediaWiki extensions Unclassified Component: Math (Other open bugs) unspecified All All High major with 4 votes (vote) --- Nobody - You can work on this! i18n (view as bug list) 73285 32578 41348 48118 56295 57770 Show dependency tree / graph

 Reported: 2013-05-03 02:14 UTC by praveenp 2014-11-15 04:00 UTC (History) 16 users (show) amir.aharoni at.light c.balasankar federicoleva hartman.wiki jiabao.foss manojkmohanme03107 max ori pavanaja peter.krautzberger physik santhosh.thottingal tim vahid.ghasemian vssun9 --- --- ---

Attachments
Math default PNG mode (129.32 KB, image/png)
2013-05-03 02:14 UTC, praveenp
Details
MathJax Mode (15.43 KB, image/png)
2013-05-03 02:15 UTC, praveenp
Details
LaTeXML Malayalam (8.43 KB, image/png)
2013-05-04 06:06 UTC, manojkmohan
Details
Problem Screenshot (188.03 KB, image/png)
2013-09-30 03:43 UTC, praveenp
Details
current rendering (MathML (left), MathJax (right) (164.87 KB, image/png)
2013-09-30 03:56 UTC, physikerwelt
Details
wikisource screenshot (34.44 KB, image/png)
2013-09-30 04:04 UTC, praveenp
Details
48032 in MathML (64.00 KB, image/png)
2014-10-19 19:34 UTC, physikerwelt
Details
PNG rendering gives Lexing Error (31.26 KB, image/png)
2014-10-24 01:56 UTC, praveenp
Details
No proper rendering in MathML (24.31 KB, image/png)
2014-10-24 01:58 UTC, praveenp
Details
No proper rendering in clientside MathJax (21.34 KB, image/png)
2014-10-24 02:03 UTC, praveenp
Details
Different renderings of Malayalam equations (43.73 KB, image/png)
2014-11-12 05:25 UTC, This, that and the other (TTO)
Details

 praveenp 2013-05-03 02:14:29 UTC Created attachment 12248 [details] Math default PNG mode Malayalam characters inside $..$ leads to lexing error‎ while using default server rendered PNG mode. Same TeX code with English characters work without any problem. Using MathJax partially fixes the problem but it is heavy and still cause some display errors. Please check both attachments. praveenp 2013-05-03 02:15:13 UTC Created attachment 12249 [details] MathJax Mode praveenp 2013-05-03 14:55:12 UTC I just found this affect various other languages including Hindi, Tamil, Arabic etc. All these languages has vast mathematical history, even before European languages catch up. Certainly not of low priority. Title change, priority increased. physikerwelt 2013-05-03 18:40:40 UTC Hi, can you provide some samples in the format. $\frac12$ : fraction 1/2 Best Physikerwelt praveenp 2013-05-04 02:31:43 UTC Malayalam: $\frac{ക}{ച}$ Tamil: $\frac{க}{த}$ Hindi: $\frac{क}{च}$ Simple forms like $ക$ also fails! physikerwelt 2013-05-04 05:47:48 UTC Thank you. I rendered it with LaTeXML. For me it looks ok. See http://math-test.instance-proxy.wmflabs.org/wiki/Main_Page Can you confirm that? manojkmohan 2013-05-04 06:06:35 UTC Created attachment 12256 [details] LaTeXML Malayalam Letter's Breaking when rendering Complex words. (In reply to comment #5) > Thank you. > I rendered it with LaTeXML. > For me it looks ok. See > http://math-test.instance-proxy.wmflabs.org/wiki/Main_Page > Can you confirm that? physikerwelt 2013-05-04 19:21:44 UTC Can you provide a tex file that creates the correct output? physikerwelt 2013-05-04 19:26:33 UTC Please include all libraries and make sure that it can be compiled with pdflatex. Thanks a lot. Jiabao Wu 2013-05-05 13:41:24 UTC Are there any other softwares can write complex Malayalam? praveenp 2013-05-06 02:43:53 UTC This is not limited to Malayalam. I've updated testwiki page: http://math-test.instance-proxy.wmflabs.org/wiki/Main_Page . You can see Hindi and Arabic also has same problems. Arabic words (rtl) reversed (ltr) also. (In reply to comment #9) > Are there any other softwares can write complex Malayalam? All popular softwares and rendering engines supporting Malayalam (and most of other Indic and Asian languages) perfectly, though I don't knowe whether TeX based systems designed for Asian Languages exist. Eventhough one such library exists, it may take its own time to get some traction and gain popularity. If I could, I would fix the server rendered PNG issues first, beccause they are light and simple (to users) and client independent. Matthew Flaschen 2013-05-06 03:18:13 UTC I think at least Arabic should be separate, since it's a RTL language, which presents specific issues. Hence, broken out to bug 48118. But I'm inclined to think all four should be separate bugs, since all these languages use different scripts. But I'll hold off for a little while in case people familiar with these scripts think they're likely to have the same solution. praveenp 2013-05-06 03:46:45 UTC That is not completely true. This happens because the script accepts and render each unicode code points independently. A probable perfect approach for Latin languages, but not for other languages such as Indic Languages and other Asian languages. This source is taken from the wiki page: ി This should be displayed as a single character. But because of script's inability to to accept non-Latin rules, this is broken into four. Same happens to all languages including Arabic. And also bug is not limited to these four languages. At least all other Indian languages are affected. Matthew Flaschen 2013-05-06 04:01:56 UTC By 'script', I was referring to the alphabet(s), not the software. You may be right that the only issues with the Indic languages are handling combining characters. But even if so, Arabic (besides the separate alphabet) is the only one with RTL issues. praveenp 2013-05-06 04:15:03 UTC I just don't know how much you can understand this because of your strong European perspective about languages :-). May be Arabic has some other issues, But issues mentioned here are so far highly related. Other RTL languages like Hebrew and Farsi also affected by this bug. Matthew Flaschen 2013-05-06 04:28:15 UTC (In reply to comment #14) > I just don't know how much you can understand this because of your strong > European perspective about languages :-). I understand the gist of your report, which is why I still have not separated the Indic languages. Note that European languages also have combining characters and ligatures, and letters based on them. > Other RTL languages like Hebrew and Farsi also affected by this bug. If they all fail to handle RTL with the same symptoms (I have not tested), please edit bug 48118 accordingly. I meant the only RTL language that had been reported so far. physikerwelt 2013-05-06 11:01:08 UTC However, a minimal example in pdflatex would be really helpful. So that developers know how it should like, can reproduce it with LaTeX and identify the problems. praveenp 2013-05-06 17:22:27 UTC (In reply to comment #15) > If they all fail to handle RTL with the same symptoms (I have not tested), > please edit bug 48118 accordingly. I think, They are failing because math extension choose code points independently and place them in between different tags. So the rendering mechanism takes these characters as different entities. This is just another effect of this bug. Please check source code of testwiki (Comment 5) page. I did find many European language ligatures, but they are atomic and so the don't suffer this problem. Derk-Jan Hartman 2013-05-19 18:36:30 UTC This is duplicate to 798. For texvc, you will need to fix latex to include all the charsets etc. For MathJax, you need to use text mode and for the more complex situations wait for SVG or MathML to finally come of age and start truly working in the majority of browsers. For MathJax HTML mode, you can perhaps see if there are ways to instruct it to NOT split certain unicode ranges into separate symbols. But that should really be taken upstream. Derk-Jan Hartman 2013-05-19 19:10:49 UTC For the MathJax part: https://github.com/mathjax/MathJax/issues/474 Matthew Flaschen 2013-05-19 19:16:37 UTC Bug 798 is both broader in some ways (all writing systems) and narrower in others (only texvc), so I'm just listing it as a see also for now. Derk-Jan Hartman 2013-05-22 20:29:22 UTC The MathJax part can be partially fixed for \text{} blocks by applying https://gerrit.wikimedia.org/r/#/c/61924/ physikerwelt 2013-09-17 22:57:49 UTC I really don't understand why this should be high priority now. There is no information how it should look like. See also http://trac.mathweb.org/LaTeXML/ticket/1737 Nemo 2013-09-17 23:28:35 UTC (In reply to comment #22) > I really don't understand why this should be high priority now. > There is no information how it should look like. Well, the bug summary is rather tranchant and is worth this priority/severity; the screenshots I looked at seemed to confirm it (the highlighted characters definitely look mangled), but I'm not confident commenting such languages and if I'm wrong please revert me and clarify the summary. praveenp 2013-09-18 03:58:53 UTC (In reply to comment #22) > I really don't understand why this should be high priority now. May I know why this should be of low priority while all non-latin languages suffering this? > There is no information how it should look like. just remove math tag from ligatures (like ഗ്ദ്ധ്രീ ) and check. Or try to use some Arabic or Hebrew. physikerwelt 2013-09-18 15:27:09 UTC Does that look right: http://math-test2.instance-proxy.wmflabs.org/w/index.php?title=Hindi,_Arabic,_Malayalam,_Tamil For me ... using Firefox... it seems to look right praveenp 2013-09-30 03:43:17 UTC Created attachment 13403 [details] Problem Screenshot All I got is Errors. Since login is not possible, It is also not possible to set MathJax default. physikerwelt 2013-09-30 03:48:47 UTC Oh. I'm right updating the database on that server. http://demo.formulasearchengine.com/index.php/User_talk:Admin physikerwelt 2013-09-30 03:56:00 UTC Created attachment 13405 [details] current rendering (MathML (left), MathJax (right) praveenp 2013-09-30 04:04:09 UTC Created attachment 13406 [details] wikisource screenshot Yes that is perfect. But some letters like ത്ര still not working in wikisource. But I just found that it is possible to avoid this by adding extra {}, like: {ത്ര}^3. And also default PNG mode is still not working. physikerwelt 2013-09-30 04:14:19 UTC Sure. I'm just developing those features. What does ത്ര mean? If it's a word with a conventional meaning instead of a variable it should be written as \text{ത്ര}^3 This will help blind people in the future to understand the content. For example the first equation on https://en.wikipedia.org/w/index.php?title=Body_mass_index is written as $\mathrm{BMI}$  | $= \frac{\text{mass}(\text{kg})}{\left(\text{height}(\text{m})\right)^2}$ This prevents wrong spacing between the letters BMI and mass and makes the equation accessible for disabled people and search engines. praveenp 2013-10-02 10:07:55 UTC ത്ര has no meaning. It is not a word. It is just a letter (ligature). physikerwelt 2013-10-16 10:34:03 UTC Unfortunately the TeX system has not very good support of non ASCII characters. It's the same problem with the German 'umlauts' öäü. In all math environments that I tried I had to use \text{ü} instead of ü. So I think we have to cope with that as well. Maybe we can come up with an interface for easy editing with visual editor one day. MathJaX can convert the samples if \text is used to MathML but not to SVG. see http://math-test2.instance-proxy.wmflabs.org/wiki/30463 This requires Math2.0 which is ready for code review at https://gerrit.wikimedia.org/r/#/c/85801/ physikerwelt 2013-10-16 10:36:45 UTC *** Bug 30463 has been marked as a duplicate of this bug. *** Peter Krautzberger 2013-10-16 19:11:48 UTC > MathJaX can convert the samples if \text is used to MathML but not to SVG. > see > http://math-test2.instance-proxy.wmflabs.org/wiki/30463 That shouldn't happen. MathJax accepts unicode input and should fall back to system fonts for characters not covered by its fonts. I suspect it's a problem with svgtex / mathoid or more precisely with phantomjs. Could you file a bug report at MathJax with some background information? physikerwelt 2013-10-19 14:07:52 UTC Hi Peter, what do you mean exactly with at MathJaX? Peter Krautzberger 2013-10-20 17:45:29 UTC (In reply to comment #35) > Hi Peter, > what do you mean exactly with at MathJaX? Hi Moritz, With "at MathJax" I meant our bug tracker at https://github.com/mathjax/mathjax/issues. But I just took another look and this might be yet another problem with embedding the SVG. That is, the "complex Malayalam equation" at http://math-test2.instance-proxy.wmflabs.org/wiki/30463 looks very different if viewed independently at http://math-test2.instance-proxy.wmflabs.org/w/index.php?title=Special:MathShowImage&hash=b3793eaa2110f756756386bb17b17d04. physikerwelt 2013-10-22 19:46:06 UTC *** Bug 2458 has been marked as a duplicate of this bug. *** physikerwelt 2013-11-06 17:43:54 UTC Is anyone around here that is familiar with OCaml? In order to make process on all the Math related bugs this change https://gerrit.wikimedia.org/r/#/c/90748 need so get a review. I verified the correctness of the code, and wrote a small guide how you can verify it by yourself at http://www.formulasearchengine.com/Verify%20texvc%20light . If you don't like to do a review I'd be interested in the reasons for that. Even a very basic feedback concerning the commit helps. i.e. Do you understand what the change is about, or should the commit message be changed. physikerwelt 2013-11-06 17:46:03 UTC I unassigned me, since I don't want to review my own code and I can not do more about it at the moment. This, that and the other (TTO) 2014-07-05 02:23:17 UTC The patches seem to have been merged or abandoned, so setting back to NEW. physikerwelt 2014-10-19 19:33:49 UTC See https://www.mediawiki.org/wiki/Extension:Math/bug/48032 in MathML rendering mode. I would say that's as good as it gets. If that's not sufficent please seperate individual bugs for the chars that don't work so that we can fix one after the oter. We would also need a reference implementation (for example a LaTeX document) that demonstrates how the result should look like. physikerwelt 2014-10-19 19:34:46 UTC Created attachment 16810 [details] 48032 in MathML I can't see what's wrong here. praveenp 2014-10-24 01:56:11 UTC Created attachment 16875 [details] PNG rendering gives Lexing Error praveenp 2014-10-24 01:58:11 UTC Created attachment 16876 [details] No proper rendering in MathML praveenp 2014-10-24 02:03:10 UTC Created attachment 16877 [details] No proper rendering in clientside MathJax Bug still persists and no proper rendering available in any mode. Test page - [[:mlഉപയോക്താവ്:Praveenp/പരിശോധന]]. There are various books with mathematical equations still not possible to include (eg: https://ml.wikisource.org/wiki/%E0%B4%A4%E0%B4%BE%E0%B5%BE:Yukthibhasa.djvu/227) physikerwelt 2014-10-24 05:32:57 UTC I still do not understand the problem I updated the demo at https://www.mediawiki.org/wiki/Extension:Math/bug/48032 I'd be happy if someone could explain it to me. This, that and the other (TTO) 2014-10-24 06:42:07 UTC Here's my explanation: * PNG output is totally broken and needs to be fixed to stop spewing red error messages. I think the problem here is quite obvious. * MathML output renders very poorly. Characters are overlapping and/or severely truncated (e.g. only half the character is visible). * MathJax seems acceptable, even though the characters are very small. Having said that, I do not read Malayalam so cannot confirm whether the rendering is indeed perfectly accurate. Hopefully Praveen can also explain some more. Peter Krautzberger 2014-10-24 07:24:23 UTC @physikerwelt could PNG be generated from the SVG? MathJax-node has hooks for Apache Batik to do this. MathJax is Unicode friendly although it will often have to rely on system fonts to provide the glyphs. physikerwelt 2014-10-24 08:09:20 UTC (In reply to This, that and the other (TTO) from comment #47) > Here's my explanation: > > * PNG output is totally broken and needs to be fixed to stop spewing red > error messages. I think the problem here is quite obvious. OK. This will be solved by Bug 72240. > * MathML output renders very poorly. Characters are overlapping and/or > severely truncated (e.g. only half the character is visible). Here, we need to clarify if the generated MathML is correct. > * MathJax seems acceptable, even though the characters are very small. > Having said that, I do not read Malayalam so cannot confirm whether the > rendering is indeed perfectly accurate. > > Hopefully Praveen can also explain some more. The problem I see with this bugs and many other bugs, is that there is a gap between the bug report created by the users and the information needed by the programmers. For example this bug describes a whole complex of problems. So there is no well defined problem that can be transformed into a testcase and then be resolved. To my understanding a programmer compatible bug report would optimally be the following: 1) What is the scenario: a) TeX input code b) Environment variables c) Which rendering mode is used? d) Which browser is used? 2) Is the input valid? 3) If it's valid: How is it supposed to look? Is there a TeX\LaTeX document that demonstrates the rendering? What should be the MathML output. Matthew Flaschen 2014-11-11 23:30:19 UTC (In reply to physikerwelt from comment #49) > The problem I see with this bugs and many other bugs, is that there is a gap > between the bug report created by the users and the information needed by > the programmers. Sometimes that takes some back-and-forth in order to figure it out. I agree that "No proper rendering" should be made clearer (e.g. "bottom of character is truncated", "characters are not joined properly", or "wrong font is used", whatever the case may be). Also, it helps to put the full markup ($through$) here in the bug report. If it's only in the screenshot, someone has to retype it or figure out where to copy it from, which is hard if it's in a language they don't know). > So there is no well defined problem that can be transformed into a testcase > and then be resolved. The PNG one seems pretty clear. I've made a sub-task for this, bug 73285. This, that and the other (TTO) 2014-11-12 05:25:40 UTC Created attachment 17099 [details] Different renderings of Malayalam equations Here's another try. Praveen, can you confirm that the renderings at the left of this image are indeed correct? First example is: $\cfrac{\text{ച}}{\angle 4\text{ത്ര}^3}$ MS Word code (linear form) is: ച/(∠4〖ത്ര〗^3 ) Second example is: $ല \left ( \frac{വ്വ(ക്ലു \pm കു_ശ)}{ടി} \right )$ MS Word code (linear form) is: ല(വ്വ(ക്ലു±〖കു〗_ശ )/ടി) Per comment 7, I tried using sharelatex to generate an example, but its pdfLaTeX renderer refuses to accept Malayalam at all, and the LuaLaTex and XeLaTeX renderers do it wrongly. physikerwelt 2014-11-12 08:28:47 UTC If you manage to make Mathjax render your input correctly, it will be easy to port that to the Math extension. Peter Krautzberger 2014-11-12 09:13:20 UTC This will generally work (so 69702 should resolve this). But I think for this specific case there's a bug in MathJax related to combining characters. I've filed https://github.com/mathjax/MathJax/issues/952 This, that and the other (TTO) 2014-11-12 09:15:10 UTC (In reply to physikerwelt from comment #52) > If you manage to make Mathjax render your input correctly, it will be easy > to port that to the Math extension. Equation 1 is working properly in MathJax but not in the SVG fallback, so that can be a starting point. praveenp 2014-11-15 04:00:27 UTC comment #51 is correct

 Note You need to log in before you can comment on or make changes to this bug.