Last modified: 2011-03-13 18:05:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T5504, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 3504 - MathML with input <math>hello</math> causes tidy to die
MathML with input <math>hello</math> causes tidy to die
Status: RESOLVED WONTFIX
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
1.6.x
All All
: Lowest normal (vote)
: ---
Assigned To: Nobody - You can work on this!
http://en.wikipedia.org/w/index.php?t...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-09-19 03:59 UTC by Tim Starling
Modified: 2011-03-13 18:05 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
proposed patch, not too nice (1.64 KB, patch)
2006-03-24 10:31 UTC, Jitse Niesen
Details

Description Tim Starling 2005-09-19 03:59:27 UTC
If the user selects MathML rendering for texvc, input text such as
<math>hello</math> will generate what tidy considers to be XML with "serious
errors". This causes severe corruption of the page display, on all those
discussion pages where people are using tidy-dependent signatures.
Comment 1 Tim Starling 2005-09-19 04:04:02 UTC
The output text for <math>hello</math> is:

<p><math
xmlns='http://www.w3.org/1998/Math/MathML'><mi>h</mi><mi>e</mi><mi>l</mi><mi>l</mi><mi>o</mi></math>
</p><p><br />
</p>
<!-- Tidy found serious XHTML errors -->

Comment 2 Jitse Niesen 2006-03-24 10:31:21 UTC
Created attachment 1459 [details]
proposed patch, not too nice

An ugly way to resolve this bug is to strip out all the <math> ... </math> tags
before sending the HTML to tidy and to plug them back in afterwards. The
attached patch follows this strategy. I've only testing it with external tidy,
because I couldn't get internal tidy to work.
Comment 3 Jitse Niesen 2006-03-24 12:45:46 UTC
Please disregard the patch. I did not test it properly against the CVS version,
and something changed in the code.
Comment 4 Jitse Niesen 2006-03-24 15:08:09 UTC
I think the patch in comment #2 works, but it depends on my patch from bug 5344.
Comment 5 sspecter 2006-05-04 22:30:40 UTC
I've made a working (but unstable) implementation with MathML on my wiki here.

Here's an example: http://www.sspecter.com/wiki/index.php/Ajuda:ASCIIMath-Sintaxe
Its in portuguese but you can see it working.

I solved that by creating MathML as an extension (probably it is not parsed by
tidy, only sanitizer), and naming the extension tag <asciimath>, to not conflict
with <math> from MathML.

My solution works but is is unstable because sanitizer like to generate
bad-formed tags inside my good-formed MathML, and my XHTML pages crash. It just
happens in some cases, like wiki list (*) + <asciimath>, or <asciimath> with
blank lines inside it. But i believe these problemas will still happen with
comment #2's solution.
Comment 6 Jitse Niesen 2006-05-05 13:39:40 UTC
(In reply to comment #5)
> I solved that by creating MathML as an extension (probably it is not parsed by
> tidy, only sanitizer), and naming the extension tag <asciimath>, to not conflict
> with <math> from MathML.

I think that the sanitizer - specifically, Sanitizer::removeHTMLtags() - does
not touch extension tags and <math> tags, though it is hard to tell anything
from Parser.php. Furthermore, it seems that tidy is not enabled on your site.
 
> My solution works but is is unstable because sanitizer like to generate
> bad-formed tags inside my good-formed MathML, and my XHTML pages crash. It just
> happens in some cases, like wiki list (*) + <asciimath>, or <asciimath> with
> blank lines inside it. 

As I said, I doubt it is the sanitizer or tidy that generates the bad-formed
tags. Your problem may be that the MathML which you generate contains newlines.
This confuses the parser. Try replacing all the newline characters with spaces.

> But i believe these problemas will still happen with comment #2's solution.

Is this just a guess, or do you have an example in which the patch does not
work? It seems to work fine on http://wiki.blahtex.org/ .
Comment 7 Brion Vibber 2006-06-03 23:43:02 UTC
I'm going to WONTFIX this.

We're going to be ditching tidy (with its bugginess, overhead, and
annoying features) once the internal HTML normalizer is fixed up,
which it soon will be (in progress, bug 5497).

With normalization working on our own output, we don't have to
worry about tidy choking on extension output.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links