Last modified: 2011-03-13 18:05:31 UTC
If the user selects MathML rendering for texvc, input text such as <math>hello</math> will generate what tidy considers to be XML with "serious errors". This causes severe corruption of the page display, on all those discussion pages where people are using tidy-dependent signatures.
The output text for <math>hello</math> is: <p><math xmlns='http://www.w3.org/1998/Math/MathML'><mi>h</mi><mi>e</mi><mi>l</mi><mi>l</mi><mi>o</mi></math> </p><p><br /> </p> <!-- Tidy found serious XHTML errors -->
Created attachment 1459 [details] proposed patch, not too nice An ugly way to resolve this bug is to strip out all the <math> ... </math> tags before sending the HTML to tidy and to plug them back in afterwards. The attached patch follows this strategy. I've only testing it with external tidy, because I couldn't get internal tidy to work.
Please disregard the patch. I did not test it properly against the CVS version, and something changed in the code.
I think the patch in comment #2 works, but it depends on my patch from bug 5344.
I've made a working (but unstable) implementation with MathML on my wiki here. Here's an example: http://www.sspecter.com/wiki/index.php/Ajuda:ASCIIMath-Sintaxe Its in portuguese but you can see it working. I solved that by creating MathML as an extension (probably it is not parsed by tidy, only sanitizer), and naming the extension tag <asciimath>, to not conflict with <math> from MathML. My solution works but is is unstable because sanitizer like to generate bad-formed tags inside my good-formed MathML, and my XHTML pages crash. It just happens in some cases, like wiki list (*) + <asciimath>, or <asciimath> with blank lines inside it. But i believe these problemas will still happen with comment #2's solution.
(In reply to comment #5) > I solved that by creating MathML as an extension (probably it is not parsed by > tidy, only sanitizer), and naming the extension tag <asciimath>, to not conflict > with <math> from MathML. I think that the sanitizer - specifically, Sanitizer::removeHTMLtags() - does not touch extension tags and <math> tags, though it is hard to tell anything from Parser.php. Furthermore, it seems that tidy is not enabled on your site. > My solution works but is is unstable because sanitizer like to generate > bad-formed tags inside my good-formed MathML, and my XHTML pages crash. It just > happens in some cases, like wiki list (*) + <asciimath>, or <asciimath> with > blank lines inside it. As I said, I doubt it is the sanitizer or tidy that generates the bad-formed tags. Your problem may be that the MathML which you generate contains newlines. This confuses the parser. Try replacing all the newline characters with spaces. > But i believe these problemas will still happen with comment #2's solution. Is this just a guess, or do you have an example in which the patch does not work? It seems to work fine on http://wiki.blahtex.org/ .
I'm going to WONTFIX this. We're going to be ditching tidy (with its bugginess, overhead, and annoying features) once the internal HTML normalizer is fixed up, which it soon will be (in progress, bug 5497). With normalization working on our own output, we don't have to worry about tidy choking on extension output.