Last modified: 2010-05-15 15:48:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 8997 - Parser hook output block level corruption
Parser hook output block level corruption
Status: RESOLVED DUPLICATE of bug 1319
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
All All
: Normal normal with 3 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
: need-parsertest
Depends on:
  Show dependency treegraph
Reported: 2007-02-15 22:55 UTC by Daniel Kinzler
Modified: 2010-05-15 15:48 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---

code snippet with parser hook (136 bytes, text/plain)
2007-02-15 22:57 UTC, Daniel Kinzler
more test cases (56 bytes, text/plain)
2007-02-15 23:22 UTC, Daniel Kinzler
Parser patch to resolve whitespace mangling of extension output. (1.76 KB, patch)
2007-02-16 20:37 UTC, Jim R. Wilson
Parser patch to resolve whitespace mangling of extension output. (883 bytes, patch)
2007-08-22 19:25 UTC, Sergey Chernyshev

Description Daniel Kinzler 2007-02-15 22:55:45 UTC
The parser seems to apply paragraph and pre-formating rules to the html returned
from parser hooks. But output from parser hooks
should be completely left alone. To see the effect, consider the following
parser hook:

function wfRenderFuckup( $input, $argv, &$parser ) {
'a b

c d
  e f


the following in wikitext: 


produces this html:

<p>xxa b
</p><p>c d
<pre> e f

Disabling tidy does not have any effect on this. Tested with current HEAD.

This is extremely bad for parser hooks that try to output html or JS code loaded
from some file, where
preformated (human readable) code may be found.
Comment 1 Daniel Kinzler 2007-02-15 22:57:53 UTC
Created attachment 3230 [details]
code snippet with parser hook

bah, bugzilla mangeled the test code. here's an attachment.
Comment 2 Daniel Kinzler 2007-02-15 23:22:53 UTC
Created attachment 3231 [details]
more test cases

wow, more fun here.

It seems like full wiki formating for lists and indenting is applied to the
parser hook output. Also, blanks before : are turned into &nbsp; for some
reason. HTML output for this test case:

<p>a: b
a&nbsp;: b

<dl><dd> bla
</dd><dt> bla&nbsp;</dt><dd> blubb
<ul><li> foo
</li><li> bar
<ol><li> foo
</li><li> bar

This is really really broken...
Comment 3 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-02-16 02:14:45 UTC
Isn't this just because parser-hook text is substituted in on an early pass and
then left to the mercies of the rest of the parser?  There should be some option
to <nowiki> it all, presumably.  Or . . . you could just add <nowiki> yourself?
 (Well, that doesn't stop paragraph breaks, so something intermediate between
<nowiki> and <pre> would be needed.)
Comment 4 Daniel Kinzler 2007-02-16 11:59:58 UTC
Well, as far as I see, not exactly an "early" pass, but *too* early: but the
output of the hook is put pack into the text  before the "final block-level
transformation" (or something). Nowiki processing was already applied at this
stage, so the <nowiki> would be passed through verbatim - it also wouldn't work
because nowiki would *escape* any HTML tags.

What would really be needed is to unstrip results of parser hooks only after
*all* transforms are done. But the parser being as convoluted as it is, I have
no idea how to do about this. Also, this behavior has apparently been around for
quite some time - so some parser hooks may even rely on it (would be a bad idea,
but should be checked).

Anyway, Brion told me to "not treat this as an emergency", so I'll set this back
to normal severity (even though I feel the current behavior is severely broken).
Also, people seem to remember that this problem has already be files - I can't
find the report though. If you do, feel free to dupe this.
Comment 5 Daniel Kinzler 2007-02-16 13:30:39 UTC
Hm, from experimenting with an extension I wrote, it seems to me that the right
place to "put back" results from a parser hook would be where the
ParserAfterTidy hook is called (or, if tidy should process it,
ParserBeforeTidy). To achieve this, the strip/unstrip mechanism should probably
be rigged so it can be specified what is unstripped when. I'm not clear on how
to do that, though.
Comment 6 Jim R. Wilson 2007-02-16 20:36:14 UTC
I will be attaching a Parser patch shortly which hides extension tag output
until after Tidy is finished.

One caveat, this patch causes two additional test failures that previously worked:

> Running test Parser hook: static parser hook not inside a comment... FAILED!
> Running test Parser hook: static parser hook inside a comment... FAILED!

The former expects "<p>hello, world\n</p>" and gets "<p>\nhello, world\n</p>"
The latter expects "<p><br />\n</p>" and gets "<p>\n</p>"

In both cases the differences are whitespace only.
Comment 7 Jim R. Wilson 2007-02-16 20:37:33 UTC
Created attachment 3235 [details]
Parser patch to resolve whitespace mangling of extension output.

Note comments in bug regarding test failures as a result of applying this
Comment 8 Jim R. Wilson 2007-02-16 20:49:05 UTC
Previous patch seems to be breaking Cite.  Specifically, <ref> tags are not
being replaced by "[1]" links - though they still render properly in the
<references> section.  Not sure yet why this is.
Comment 9 Jim R. Wilson 2007-02-16 22:24:06 UTC
Looks like the parsing stops after putting in the <!-- LINK # --> flags.  The
<sup> tags are there for the references, but only <!--LINK #--> can be found inside.

These LINK comments are being generated by the makeLinkHolder() function.  Still
not sure why they're not getting resolved.
Comment 10 Sean J 2007-04-30 18:30:11 UTC
The problem comes from the doBlockLevels() call in parse() which renders lists
and inserts paragraph tags. Tags shouldn't be unstripped until after lists are
rendered but currently they are. So all that needs to happen is to move this
line of code: 
$text = $this->mStripState->unstripGeneral( $text );
to just below the doBlockLevels() call.

The only effect this would have is on gallery tags, but the list functionality
in gallery tags is buggy anyway.  

Does anyone else see any problems with this fix?
Comment 11 Sergey Chernyshev 2007-08-22 19:25:39 UTC
Created attachment 4030 [details]
Parser patch to resolve whitespace mangling of extension output.

Sean J's recommendations seem to work fine. I made new patch for this issue for trunk (works fine on current stable 1.10 branch too).

I'll be happy to see it in the core.
Comment 12 Sergey Chernyshev 2008-06-21 16:45:08 UTC
This seems to be the case in the new (1.12) parser as well.
Comment 13 Chad H. 2008-12-14 20:21:34 UTC
This patch breaks a lot of parser tests:

16 previously passing test(s) now FAILING! :(
 * Parser hook: empty input
 * Parser hook: empty input using terminated empty elements
 * Parser hook: empty input using terminated empty elements (space before)
 * Parser hook: basic input
 * Parser hook: case insensitive
 * Parser hook: case insensitive, redux
 * Parser hook: nested tags
 * Parser hook: basic arguments
 * Parser hook: argument containing a forward slash
 * Parser hook: empty input using terminated empty elements (bug 2374)
 * Parser hook: basic arguments using terminated empty elements
 * Parser hook: static parser hook not inside a comment
 * Parser hook: static parser hook inside a comment
 * Special page transclusion
 * Special page transclusion twice (bug 5021)
 * Gallery
Comment 14 Philipp Spitzer 2008-12-21 15:49:59 UTC
A note about the workaround (replace output with marker in the parser hook - back-replace by content in the ParserAfterTidy) as described in causes the final content to be wrapped with <p>. This causes the final output to fail the HTML validation in case the parser hook produces output that is not allowed inside <p>, e.g. <noscript> or <div>.
I did not find a way to prevent this.

Maybe a parameter would be needed where a tag extension can "tell" Mediawiki whether it returns inline or block elements? (or have I overlooked something in the documentation?) Otherwise I cannot find a way how to prevent Mediawiki to place the output in <p>s?
Comment 15 Tim Starling 2009-06-19 15:37:40 UTC

*** This bug has been marked as a duplicate of bug 1319 ***

Note You need to log in before you can comment on or make changes to this bug.