Last modified: 2011-03-13 18:05:31 UTC
If the user selects MathML rendering for texvc, input text such as
<math>hello</math> will generate what tidy considers to be XML with "serious
errors". This causes severe corruption of the page display, on all those
discussion pages where people are using tidy-dependent signatures.
The output text for <math>hello</math> is:
<!-- Tidy found serious XHTML errors -->
Created attachment 1459 [details]
proposed patch, not too nice
An ugly way to resolve this bug is to strip out all the <math> ... </math> tags
before sending the HTML to tidy and to plug them back in afterwards. The
attached patch follows this strategy. I've only testing it with external tidy,
because I couldn't get internal tidy to work.
Please disregard the patch. I did not test it properly against the CVS version,
and something changed in the code.
I think the patch in comment #2 works, but it depends on my patch from bug 5344.
I've made a working (but unstable) implementation with MathML on my wiki here.
Here's an example: http://www.sspecter.com/wiki/index.php/Ajuda:ASCIIMath-Sintaxe
Its in portuguese but you can see it working.
I solved that by creating MathML as an extension (probably it is not parsed by
tidy, only sanitizer), and naming the extension tag <asciimath>, to not conflict
with <math> from MathML.
My solution works but is is unstable because sanitizer like to generate
bad-formed tags inside my good-formed MathML, and my XHTML pages crash. It just
happens in some cases, like wiki list (*) + <asciimath>, or <asciimath> with
blank lines inside it. But i believe these problemas will still happen with
comment #2's solution.
(In reply to comment #5)
> I solved that by creating MathML as an extension (probably it is not parsed by
> tidy, only sanitizer), and naming the extension tag <asciimath>, to not conflict
> with <math> from MathML.
I think that the sanitizer - specifically, Sanitizer::removeHTMLtags() - does
not touch extension tags and <math> tags, though it is hard to tell anything
from Parser.php. Furthermore, it seems that tidy is not enabled on your site.
> My solution works but is is unstable because sanitizer like to generate
> bad-formed tags inside my good-formed MathML, and my XHTML pages crash. It just
> happens in some cases, like wiki list (*) + <asciimath>, or <asciimath> with
> blank lines inside it.
As I said, I doubt it is the sanitizer or tidy that generates the bad-formed
tags. Your problem may be that the MathML which you generate contains newlines.
This confuses the parser. Try replacing all the newline characters with spaces.
> But i believe these problemas will still happen with comment #2's solution.
Is this just a guess, or do you have an example in which the patch does not
work? It seems to work fine on http://wiki.blahtex.org/ .
I'm going to WONTFIX this.
We're going to be ditching tidy (with its bugginess, overhead, and
annoying features) once the internal HTML normalizer is fixed up,
which it soon will be (in progress, bug 5497).
With normalization working on our own output, we don't have to
worry about tidy choking on extension output.