Last modified: 2014-09-23 22:36:10 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 798 - Many character sets don't work in texvc
Many character sets don't work in texvc
Status: REOPENED
Product: MediaWiki extensions
Classification: Unclassified
Math (Other open bugs)
unspecified
All All
: Low normal with 19 votes (vote)
: ---
Assigned To: physikerwelt
https://meta.wikimedia.org/wiki/Help:...
: i18n, patch, patch-reviewed
: 1799 3752 4199 4533 6596 8305 8316 54778 (view as bug list)
Depends on:
Blocks: 2458 40760 38721
  Show dependency treegraph
 
Reported: 2004-10-28 20:29 UTC by Peter Gervai (grin)
Modified: 2014-09-23 22:36 UTC (History)
24 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
adds additional custom preamble to TeX code through texvc arguments (3.89 KB, patch)
2006-02-10 18:21 UTC, Branko Kokanovic
Details

Description Peter Gervai (grin) 2004-10-28 20:29:09 UTC
The page (recent as today, 22:18 CET) contains a Math with the above error. 
The editor said he changed the math several times and got different errors, 
mostly probably syntax errors. This error message however complaining about 
not syntax problems but installation ones, which is very weird.

jeronim_ checked and said:

22:20:34 <@jeronim_> i dunno what that math error is about
22:20:45 <@jeronim_> it's not just yongle, it's other machines
22:21:35 < grin> jeronim_: editor said he changed a word in the math
22:21:49 < grin> jeronim_: and the error come up. maybe page history shows it, 
I try to check
22:22:03 <@jeronim_> sorry i can't really help beyond just seeing if the right 
software is installed
22:22:11 <@jeronim_> you'd have to ask someone else
Comment 1 Peter Gervai (grin) 2004-10-29 12:37:14 UTC
buggy math moved to talk page until fixed. 
Comment 2 Peter Gervai (grin) 2004-10-29 13:09:18 UTC
Bug:
*<math> \mbox{pá} - </math> - bad
*<math> \mbox{pa} - </math> - good

Seems to be something messed up about accented/utf8 chars and minus sign.
Comment 3 FoeNyx 2005-03-26 10:37:25 UTC
*** Bug 1759 has been marked as a duplicate of this bug. ***
Comment 4 Brion Vibber 2005-04-01 20:38:51 UTC
*** Bug 1799 has been marked as a duplicate of this bug. ***
Comment 5 Brion Vibber 2005-10-20 02:32:22 UTC
*** Bug 3752 has been marked as a duplicate of this bug. ***
Comment 6 Goldie 2005-10-20 20:02:30 UTC
IMHO bug 3752 is not wrong but missing functionality in TeX. For example I do
not know whether the Knuth's font does have cyrillic letters at all. Thus I
consider this as a request for enhancement.
Comment 7 Brion Vibber 2005-12-06 23:12:14 UTC
*** Bug 4199 has been marked as a duplicate of this bug. ***
Comment 8 Brion Vibber 2006-01-08 22:06:17 UTC
*** Bug 4533 has been marked as a duplicate of this bug. ***
Comment 9 Jan Kraljič 2006-01-08 22:18:26 UTC
Is there any work going on to solve this bug?
Comment 10 Brion Vibber 2006-01-08 23:19:46 UTC
There's not really anyone who's familiar with the TeX stuff who's been 
active in the last couple years.
Comment 11 Carlos 2006-01-18 21:12:59 UTC
The reason of this problem could be simply that TeX is trying to load the
package ucs.sty and dies when it does not find it. If that's the case, you
should either run

 # apt-get install latex-ucs

or apply the attached patch. (But installing the package is better because it
covers a large unicode range.)

Index: texutil.ml
===================================================================
RCS file: /cvsroot/wikipedia/phase3/math/texutil.ml,v
retrieving revision 1.12
diff -u -r1.12 texutil.ml
--- texutil.ml  12 Jan 2006 20:38:31 -0000      1.12
+++ texutil.ml  18 Jan 2006 21:05:37 -0000
@@ -44,7 +44,7 @@
 let tex_mod_reset ()   = (modules_ams := false; modules_nonascii := false;
modules_encoding := UTF8; modules_color := false)
 
 let get_encoding = function
-    UTF8 -> "\\usepackage{ucs}\n\\usepackage[utf8]{inputenc}\n"
+    UTF8 -> "\\usepackage[utf8]{inputenc}\n"
   | LATIN1 -> "\\usepackage[latin1]{inputenc}\n"
   | LATIN2 -> "\\usepackage[latin2]{inputenc}\n"
Comment 12 Maxim Razin 2006-01-22 12:10:08 UTC
The problem with the original LaTeX is that you have to switch font encodings
manually.  E.g. for an English/Russian/Polish/Greek text three encodings should
be used: latin (T1), cyrillic (T2A) and greek (LGR).  Something like this:

\documentclass{article}
\usepackage[utf8x]{inputenc}
\usepackage[T2A,LGR,T1]{fontenc} % The last encoding is default
\newcommand\cyr[1]{\bgroup\fontencoding{T2A}\selectfont #1\egroup}
\newcommand\grk[1]{\bgroup\fontencoding{LGR}\selectfont #1\egroup}
\pagestyle{empty}
\begin{document}
$$ a=b\quad\mbox{if/\cyr{если}/jeśli/\grk{εἰ}}\quad c=d $$
\end{document}

It works, but quite ugly.  And I completely don't know how to deal with
right-to-left scripts and CJK.
Comment 13 Branko Kokanovic 2006-02-10 18:21:50 UTC
Created attachment 1383 [details]
adds additional custom preamble to TeX code through texvc arguments

There's new variable that should be set to anything that one wants to be
appended to TeX preamble. Example:
$wgTeXPreambleAdditional="\usepackage[T2A]{fontenc}\nAnother line in preamble";
Comment 14 Валентин Стойков 2006-03-04 21:55:15 UTC
Another example: 
<math> C = BW \times \log_2 \left( 1+\frac{P_с}{P_ш} \right) </math> 
http://bg.wikipedia.org/wiki/Беседа:Пропускателна_способност 
 
Comment 15 Brion Vibber 2006-07-09 04:44:34 UTC
*** Bug 6596 has been marked as a duplicate of this bug. ***
Comment 16 Heinz-Josef Lücking 2006-09-11 14:05:08 UTC
I get this error when setting   $wgUseTeX = true;   in localsettings:

Es gab einen Syntaxfehler in der Datenbankabfrage. Die letzte Datenbankabfrage
lautete:
    (SQL-Abfrage versteckt)
aus der Funktion „MathRenderer::_recall“. MySQL meldete den Fehler „1267:
Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and
(utf8_general_ci,COERCIBLE) for operation '=' (localhost)“.
Comment 17 Mormegil 2006-09-11 18:56:55 UTC
(In reply to comment #16)

This is not connected with this bug. The error is probably caused by wrong table definition (and the fact that you 
use UTF-8 character set): The `math_inputhash` column in the `math` table should have explicit binary collation.
Comment 18 Jutiphan 2006-12-01 03:00:35 UTC
We are having this problem in Thai Wikipedia. Thai characters do not work properly with Math tags and need some help. Thanks for anyone who can shed the light on 
this.
Comment 19 Brion Vibber 2006-12-18 23:15:37 UTC
*** Bug 8305 has been marked as a duplicate of this bug. ***
Comment 20 Brion Vibber 2006-12-19 18:28:59 UTC
*** Bug 8316 has been marked as a duplicate of this bug. ***
Comment 21 Donald Rogers 2008-05-15 23:22:02 UTC
I added these two lines to page http://eo.wikipedia.org/wiki/Kemia_ekvilibro

:<math>\mbox{rapido de antauxena reakcio} = k_+ {A}^\alpha{B}^\beta \,\!</math>
:<math>\mbox{rapido de inversa reakcio} = k_{-} {S}^\sigma{T}^\tau \,\!</math>

The second line works okay; the first fails, apparently because of the ux combination which it is supposed to convert to ŭ.
Comment 22 Happy-melon 2009-07-24 10:57:30 UTC
(In reply to comment #21)
> I added these two lines to page http://eo.wikipedia.org/wiki/Kemia_ekvilibro
> 
> :<math>\mbox{rapido de antauxena reakcio} = k_+ {A}^\alpha{B}^\beta \,\!</math>
> :<math>\mbox{rapido de inversa reakcio} = k_{-} {S}^\sigma{T}^\tau \,\!</math>
> 
> The second line works okay; the first fails, apparently because of the ux
> combination which it is supposed to convert to ŭ.
> 

The page now seems to correctly render the ŭ character correctly.  The testcases in c2 also display correctly.  Assuming FIXED.
Comment 23 Peter Gervai (grin) 2009-07-24 18:37:05 UTC
Fix confirmed.
Comment 24 Ragib Hasan 2009-07-24 20:43:21 UTC
Are you sure that this has been fixed? I just tried the following formula in :bn:, and it still shows a parse error:

:<math>\mbox{কখগ} = k_+ {A}^\alpha{B}^\beta \,\!</math>

The error message shows:    পার্স করতে ব্যর্থ (PNG রূপান্তর ব্যর্থ; latex, dvips, gs, এবং convert ঠিকমত ইন্সটল হয়েছে কি না পরীক্ষা করুন): \mbox{কখগ} = k_+ {A}^\alpha{B}^\beta \,\!

The translation in English is: Failed to parse (Failed to convert to PNG; please check if latex, dvips, gs, and convert are installed correctly)

I also noticed that we cannot use Bengali numerals (in unicode UTF-8) inside latex formulas. That gives us the failure to parse error in bn.wikipedia.






Comment 25 Peter Gervai (grin) 2009-07-25 06:31:43 UTC
Well at least latin script unicode works (latin extended block), but see:

http://en.wikipedia.org/wiki/User:Grin/mathtest

Indeed apart from latin script it still fails.
Comment 26 Fibonacci 2009-07-25 17:50:47 UTC
Not even for Latin script. <math>í</math> gets me the following error:
Failed to parse (lexing error): í

It seems that it will only work if the non-ASCII text is inside an mbox.
Comment 27 Sumana Harihareswara 2011-12-23 18:07:10 UTC
Branko, thank you for your patch.  I am sorry it's been unreviewed for so long; I am 99% certain that it's been somewhat obsoleted since you wrote it.  Is this bug still reproducible?  If so, would you be interested in revisiting it?
Comment 28 Helder 2011-12-23 20:45:11 UTC
(In reply to comment #27)
> Is this bug still reproducible?

Per [[meta:Help:Displaying_a_formula#Rendering]], \mbox{ð} and \mbox{þ} will give an error:
* Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): \mbox {ð}
* Failed to parse (PNG conversion failed; check for correct installation of latex and dvipng (or dvips + gs + convert)): \mbox {þ}

These error messages are still displayed on that metawiki page.
Comment 29 matanya 2012-07-26 17:50:18 UTC
With MathJax this can be set to resolved. Mathjax site has a list of compatibility here: http://www.mathjax.org/resources/browser-compatibility/

so basically it is supported on all browsers and platforms.
Comment 30 Helder 2012-07-26 23:14:54 UTC
The PNG conversion is still failing on WMF wikis (just checked on the documentation page mentioned on comment 28).

Besides, until MathJax is enabled by default (bug 36496), it can not be considered a fix to this bug which still happens on Wikipedia.
Comment 31 Derk-Jan Hartman 2012-09-07 17:01:09 UTC
OK, so the difference between mbox and text seems to have disappeared at some point. The problem is now that not all character sets are supported.

Unfortunately LateX doesn't support full unicode. Perhaps we should consider switching to XeTeX ?
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=xetex
http://en.wikipedia.org/wiki/XeTeX
Comment 32 Tim Landscheidt 2013-05-20 02:45:04 UTC
(In reply to comment #31)
> OK, so the difference between mbox and text seems to have disappeared at some
> point. The problem is now that not all character sets are supported.

> Unfortunately LateX doesn't support full unicode. Perhaps we should consider
> switching to XeTeX ?
> http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=xetex
> http://en.wikipedia.org/wiki/XeTeX

It doesn't make sense to address this before resolving bug 34038 first.  It'd probably be relatively easy to convert some Unicode input into something LaTeX renders correctly, but then the initial incentive to use LaTeX as a format -- i. e. freely transfer text between wiki and LaTeX documents -- gets lost completely.

I think the question should be directed the other way, lifted of any past efforts: If a British/French/Bengali/Arabian wiki author wants to enter a formula, what formats a) ease that work and b) are well established?  If most authors will use a formula editor, the format choice can be guided mainly by technical considerations.  If we expect most formulas to be entered manually by people unfamiliar with TeX, the latter would be an odd choice as its behaviour can be as surprising as MediaWiki's wiki parser and a format that would /define/ a formula instead of being /commands/ to a typesetter would clearly be preferable.

Even a change to use XeTeX should IMHO be reflected by the use of a new tag ("<math-xe>" or something similar), so that we don't cause more headaches than neccessary.
Comment 33 Brion Vibber 2013-09-30 10:24:53 UTC
*** Bug 54778 has been marked as a duplicate of this bug. ***
Comment 34 sodabottle 2013-09-30 10:41:43 UTC
Hi Brion,

As a temporary fix for Tamil wikipedia (Bug 54778), can MathJax be enabled as default in Ta Wiki alone. I read bug 36496 and it says it wasn't made default in wiki projects because of slow loading time in low-end computers. I tested some math heavy pages in some low end machines (1GB Ram, Win XP) and the time seems acceptable. Currently we face the choice between "fast page load with render as png but with errors" vs "default MathJax".

If I can obtain community consensus is it possible for making MathJax default for Ta Wiki?
Comment 35 physikerwelt 2013-09-30 16:25:34 UTC
I think we should wait until Math 2.0 is deployed. This enables the same filtering of the commands sent to MathJaX as those sent to latex. This prevents that the grammar diverges.
Furthermore Frederic Wang did major improvments to the matjax loader, that depend on mathjax 2.3. I would strongly recommend to wait until these changes are merged as well.
Comment 36 Yuvi Panda 2013-09-30 16:28:53 UTC
@SodaBottle: Can you open up another bug + start the community discussion for it as well? Thanks!
Comment 37 Peter Krautzberger 2013-09-30 16:39:37 UTC
Just to throw it out there. If the math extension used MathJax on the backend, then it seems a lot of these problems would go away. MathJax's TeX-input is slightly more powerful than texvc, is designed for a web environment and would remove the need for sanitization.
Comment 38 physikerwelt 2013-09-30 23:37:52 UTC
@Peter: I think we should not make the same mistake
(to use a not well defined subset of latex extended by some customized macros) again and use the new MathJax language instead of texvc.
I think changing the language of math input that is shared between all languages should be a common process.
Comment 39 Peter Krautzberger 2013-10-01 05:18:41 UTC
@Moritz  I understand your concerns but would argue that MathJax consists of a well defined subset of TeX.

> I think changing the language of math input that is shared between all
languages should be a common process.

I don't understand that part of your message :( Was something lost by an accidental edit?
Comment 40 physikerwelt 2013-10-01 05:54:40 UTC
> I think changing the language of math input that is shared between all
languages should be a common process.
I mean natural languages. At the moment all wiki installations use the same texvc input language like eg. \sen

I just googled for texvc discussion and found some parts:
http://meta.wikimedia.org/wiki/Texvc

Maybe this becomes off-topic... However, it supports my argument that there should be a discussion how the restricted set of input commands should look like. Especially it should not be determined by the technical limitations of the x-Rendering-Software.
Comment 41 Peter Krautzberger 2013-10-01 17:20:03 UTC
(In reply to comment #40)

> I just googled for texvc discussion and found some parts:
> http://meta.wikimedia.org/wiki/Texvc

Thanks. That's very interesting.

> Maybe this becomes off-topic... 

Probably.

> However, it supports my argument that there
> should be a discussion how the restricted set of input commands should look
> like. 

I agree with that but...

> Especially it should not be determined by the technical limitations of
> the x-Rendering-Software.

I find this too idealistic. In reality, there aren't many solutions for math on the web, all of which have with their own limitations and advantages in a MW setting. The critical question is: what direction MW and its community (in particular Wikipedia) want to take mathematical and scientific content. As suggested by WMF, I tried to start a discussion about this on Wikitech-I but not much came out of it. So the answer seems to be: nobody cares. 

Which is why I fully agree with you (but it makes me depressed).

Peter.
Comment 42 Peter Krautzberger 2013-10-01 17:20:38 UTC
(In reply to comment #40)

> I just googled for texvc discussion and found some parts:
> http://meta.wikimedia.org/wiki/Texvc

Thanks. That's very interesting.

> Maybe this becomes off-topic... 

Probably.

> However, it supports my argument that there
> should be a discussion how the restricted set of input commands should look
> like. 

I agree with that but...

> Especially it should not be determined by the technical limitations of
> the x-Rendering-Software.

I find this too idealistic. In reality, there aren't many solutions for math on the web, all of which have with their own limitations and advantages in a MW setting. The critical question is: what direction MW and its community (in particular Wikipedia) want to take mathematical and scientific content. As suggested by WMF, I tried to start a discussion about this on Wikitech-I but not much came out of it. So the answer seems to be: nobody cares. 

Which is why I fully agree with you (but it makes me depressed).

Peter.
Comment 43 Matthew Flaschen 2013-10-01 18:22:01 UTC
(In reply to comment #42)
> As suggested by WMF, I tried to start a discussion about this on Wikitech-I but
> not much came out of it. So the answer seems to be: nobody cares. 

Sorry to hijack this bug, but I'll just post once then hopefully it can move back to RFC and/or Wikitech.

I think some feedback from Wikitech (e.g. Flow and issues with certain languages) was helpful, but I agree the Wikitech thread basically finished.

For smaller stuff, the answer is Just Do It, and hash anything out in code review.

For bigger architectural things (what to store in the database [e.g. MathML not TeX]), or having a single way of validating TeX/ANTLR grammar) where you want an answer before coding, it's probably time for an RFC (https://www.mediawiki.org/wiki/Requests_for_comment).  We talked about if/when to do this before, but now is probably a good time.  Pick a single issue, unless of course one decision clearly implies others, in which case you should include the related ones.

Above are just examples based on past discussions; the RFC can be whatever you think is appropriate.  An example past implemented RFC (though probably a bit simpler) is https://www.mediawiki.org/wiki/Requests_for_comment/Reduce_math_rendering_preferences

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links