Last modified: 2014-01-28 15:19:05 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T10948, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 8948 - Parser: HTML table syntax (e.g. <td>) should not be parsed when inside <pre>
Parser: HTML table syntax (e.g. <td>) should not be parsed when inside <pre>
Status: RESOLVED WORKSFORME
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
1.17.x
All All
: Low normal with 2 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://www.mediawiki.org/wiki/Project...
:
Depends on:
Blocks: html tidy well-formedness
  Show dependency treegraph
 
Reported: 2007-02-11 14:52 UTC by Tim Trent
Modified: 2014-01-28 15:19 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Tim Trent 2007-02-11 14:52:49 UTC
Description copied from support desc page (though please treat as a bug report,
not a plea for help):
----
== Mixing Wikitable and HTML syntax ==

I am setting up a new wiki using the current stable version.  Under the terms of
the GFDL I am copying a little (attributed) information from Wikipedia.  This
includes a template which works perfectly there, but does not on my new
implementation.

I have tracked the problem down to the original template author mixing wikitable
pipe syntax and html table syntax.   Ignoring the fact that this is poor
practice, I need to know what I must do to make it work on my new wiki, please.

To distill the problem I have an example:

<pre>
{|
|-
<td>
Wiki table including conventional table syntax
</td>
|-
|}
</pre>

This creates

<pre>
<td> Wiki table including conventional table syntax </td>
</pre>

in the finished article.

If I do the same on Wikipedia I just get
<pre>
Wiki table including conventional table syntax
</pre>
This is the intended end result, and thus highly desirable.

----

This is regrettably simple to reproduce on pujr pretty much vanilla installation
Comment 1 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-02-11 17:09:54 UTC
I vaguely recall that this has something to do with whether Tidy is enabled. 
Try enabling Tidy (even if it's not installed, I think it keys off $wgUseTidy
somehow) and see if it works.
Comment 2 Tim Trent 2007-02-11 17:13:44 UTC
That is a valid workaround, yes.  But the concept of running Tidy every time a
page is rendered is, at best "unusual".  What it shows is that the code produced
is buggy.  This ought to be a simple issue to resolve, and, if resolved
correctly, will have no negative impact in everyone who has been constrained to
run Tidy.
Comment 3 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-02-11 17:52:54 UTC
Correct, this is almost certainly a bug and should be fixed.  But that fact is
necessary to figure out where this is happening and/or reproduce it on a local
install.
Comment 4 Tim Trent 2007-02-11 18:02:47 UTC
The challenge with a bug like this is that it wastes an inordinate amount of
time to locate, to identify, and to track down while trying to install and
configure a wiki.  Reproducing it is dead simple.  You turn tidy off and use my
examples, whcuh are the simplest illustration of the problem.  Since it is
repeatable in a controlled circumstance the bug has to be within the area that
parses the table pipes (0.9 probability)

We need to ignore the fact that people mix HTML and piped syntax, and
concentrate on the simple issue that it escapes the lt and the gt - a bizarre
behaviour
Comment 5 Brion Vibber 2007-02-11 18:14:07 UTC
This is illegal wiki syntax; fix all templates using such constructs or they'll
break when we fix the bug.
Comment 6 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-02-11 18:17:39 UTC
Why is it illegal wiki syntax?
Comment 7 Tim Trent 2007-02-11 18:19:38 UTC
(In reply to comment #5)
> This is illegal wiki syntax; fix all templates using such constructs or they'll
> break when we fix the bug.

You know, since this illegal wiki syntax pervades wikipedia, and since I just
simplified it to show you here, I genuinely do not care one way or the other.  I
do care about the tone of that message, though.  I'm glad I bothered to report it.  

And if it is illegal wiki syntax, why does it render a correct table at all with
Tidy turned on?

The bug is the bug.
Comment 8 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-02-11 18:33:03 UTC
(In reply to comment #7)
> And if it is illegal wiki syntax, why does it render a correct table at all with
> Tidy turned on?

I believe Brion perceives the bug to be that the table does render with Tidy on,
rather than not rendering with Tidy off.  I don't know why, though.
Comment 9 Tim Trent 2007-02-11 18:40:44 UTC
There is nothing, anywhere, to state that the syntax is illegal.  It is obvious
insanity to mix the syntax, but wikis allow insane people to edit. 

If there is to be rigid syntax (not arguing ome way or the other) then that
syntax needs to be parsed for the wiki-editor and rejected at submit time.
Comment 10 Brion Vibber 2007-02-11 18:50:59 UTC
It's illegal because | and <td> are different things, as '' and <i> are.
But there's some disagreement on it, so we haven't yet made the fix to the tidy
mode to operate properly. :)

The difference in behavior with tidy on and off is a known problem due to the
way tidy is run at a high level while the built-in HTML nesting sanitizer is run
on smaller chunks. (A known problem for some time.)

It's possible that we'll change the built-in sanitizer to behave more like the
way we use tidy, which is IMHO sloppy and ugly and dangerous, but would remain
backwards-compatible with the existing bogus templates.
Comment 11 Tim Trent 2007-02-11 21:35:24 UTC
Since it is a known problem, and since it is not documented anywhere, or at
least anywhere the slightest bit obvious, then it at least should be documented
in a substantially better manner.

The argument about | vs <td> and '' vs <i> is interesting.  But, since ''
generates <i> or conceivably <em> (I have not checked), and also generates the
closing tag, it seems to me that one could say with some validity that | is, in
this circumstance, equivalent to <td>.
Comment 12 Daniel Kinzler 2007-02-13 17:13:35 UTC
'' and <i>  are different because '' may turn into <i> or </i> depending on
context - or even stay '', for example if there's nothing else in the paragraph
- or turn into <b> if followed by another '. The same is true for the table
syntax i suppose - it becomes very hard to parse the wiki-style tables when you
at the same time try to respect html markup.

Ans yes, this probably should be documented somewhere.

Comment 13 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-02-13 18:51:57 UTC
I'm pretty sure that it's not possible to open an element with wikimarkup and
close it with HTML or vice versa, anywhere, so the analogy to '' is perhaps not
apt.  All the wikitext parser has to do is ignore stuff inside tables that don't
look like table rows/cells/etc., and let the sanitizer/Tidy deal with it if they
aren't actually table rows/cells/etc.  On the other hand, trying to match stuff
like "''Foo</i>bar''baz" would require significantly complicating the wikitext
parser (well, the wikitext regex replacements :P).

It can be convenient to mix wikitables and HTML tables.  For instance, lots of
pipes might occur (or potentially occur, for a template) somewhere inside a
table cell, and you avoid any problems with those by using HTML markup for that
one cell.
Comment 14 Dan Collins 2011-07-12 04:23:49 UTC
Tested on live, the code still works on wikipedia at 1.17wmf1, and still does not work on a local wiki at 1.16.2.
Comment 15 Kevin Israel (PleaseStand) 2014-01-28 15:19:05 UTC
I don't see this bug in the latest master version of MediaWiki, with or without Tidy enabled.

Everything inside <pre>...</pre> is HTML escaped.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links