Last modified: 2010-05-15 15:48:31 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T10997, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 8997 - Parser hook output block level corruption


Summary:	Parser hook output block level corruption

Status:	RESOLVED DUPLICATE of bug 1319

Product:	MediaWiki
Classification:	Unclassified
Component:	Parser (Other open bugs)
Version:	1.9.x
Hardware:	All All

Importance:	Normal normal with 3 votes (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:	need-parsertest

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2007-02-15 22:55 UTC by Daniel Kinzler
Modified:	2010-05-15 15:48 UTC (History)
CC List:	4 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
code snippet with parser hook (136 bytes, text/plain) 2007-02-15 22:57 UTC, Daniel Kinzler	Details
more test cases (56 bytes, text/plain) 2007-02-15 23:22 UTC, Daniel Kinzler	Details
Parser patch to resolve whitespace mangling of extension output. (1.76 KB, patch) 2007-02-16 20:37 UTC, Jim R. Wilson	Details
Parser patch to resolve whitespace mangling of extension output. (883 bytes, patch) 2007-08-22 19:25 UTC, Sergey Chernyshev	Details
Show Obsolete (1) Add an attachment (proposed patch, testcase, etc.)

Description Daniel Kinzler 2007-02-15 22:55:45 UTC

The parser seems to apply paragraph and pre-formating rules to the html returned
from parser hooks. But output from parser hooks
should be completely left alone. To see the effect, consider the following
parser hook:

function wfRenderFuckup( $input, $argv, &$parser ) {
return
'a b

c d
  e f

gh
';
}


the following in wikitext: 

xx<fuckup/>xx

produces this html:

<p>xxa b
</p><p>c d
</p>
<pre> e f
</pre>
<p>gh
xx
</p>

Disabling tidy does not have any effect on this. Tested with current HEAD.

This is extremely bad for parser hooks that try to output html or JS code loaded
from some file, where
preformated (human readable) code may be found.

Comment 1 Daniel Kinzler 2007-02-15 22:57:53 UTC

Created attachment 3230 [details]
code snippet with parser hook

bah, bugzilla mangeled the test code. here's an attachment.

Comment 2 Daniel Kinzler 2007-02-15 23:22:53 UTC

Created attachment 3231 [details]
more test cases

wow, more fun here.

It seems like full wiki formating for lists and indenting is applied to the
parser hook output. Also, blanks before : are turned into &nbsp; for some
reason. HTML output for this test case:

<p>a: b
a&nbsp;: b
</p>

<dl><dd> bla
</dd><dt> bla&nbsp;</dt><dd> blubb
</dd></dl>
<ul><li> foo
</li><li> bar
</li></ul>
<ol><li> foo
</li><li> bar
</li></ol>

This is really really broken...

Comment 3 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-02-16 02:14:45 UTC

Isn't this just because parser-hook text is substituted in on an early pass and
then left to the mercies of the rest of the parser?  There should be some option
to <nowiki> it all, presumably.  Or . . . you could just add <nowiki> yourself?
 (Well, that doesn't stop paragraph breaks, so something intermediate between
<nowiki> and <pre> would be needed.)

Comment 4 Daniel Kinzler 2007-02-16 11:59:58 UTC

Well, as far as I see, not exactly an "early" pass, but *too* early: but the
output of the hook is put pack into the text  before the "final block-level
transformation" (or something). Nowiki processing was already applied at this
stage, so the <nowiki> would be passed through verbatim - it also wouldn't work
because nowiki would *escape* any HTML tags.

What would really be needed is to unstrip results of parser hooks only after
*all* transforms are done. But the parser being as convoluted as it is, I have
no idea how to do about this. Also, this behavior has apparently been around for
quite some time - so some parser hooks may even rely on it (would be a bad idea,
but should be checked).

Anyway, Brion told me to "not treat this as an emergency", so I'll set this back
to normal severity (even though I feel the current behavior is severely broken).
Also, people seem to remember that this problem has already be files - I can't
find the report though. If you do, feel free to dupe this.

Comment 5 Daniel Kinzler 2007-02-16 13:30:39 UTC

Hm, from experimenting with an extension I wrote, it seems to me that the right
place to "put back" results from a parser hook would be where the
ParserAfterTidy hook is called (or, if tidy should process it,
ParserBeforeTidy). To achieve this, the strip/unstrip mechanism should probably
be rigged so it can be specified what is unstripped when. I'm not clear on how
to do that, though.

Comment 6 Jim R. Wilson 2007-02-16 20:36:14 UTC

I will be attaching a Parser patch shortly which hides extension tag output
until after Tidy is finished.

One caveat, this patch causes two additional test failures that previously worked:

> Running test Parser hook: static parser hook not inside a comment... FAILED!
> Running test Parser hook: static parser hook inside a comment... FAILED!

The former expects "<p>hello, world\n</p>" and gets "<p>\nhello, world\n</p>"
The latter expects "<p><br />\n</p>" and gets "<p>\n</p>"

In both cases the differences are whitespace only.

Comment 7 Jim R. Wilson 2007-02-16 20:37:33 UTC

Created attachment 3235 [details]
Parser patch to resolve whitespace mangling of extension output.

Note comments in bug regarding test failures as a result of applying this
patch.

Comment 8 Jim R. Wilson 2007-02-16 20:49:05 UTC

Previous patch seems to be breaking Cite.  Specifically, <ref> tags are not
being replaced by "[1]" links - though they still render properly in the
<references> section.  Not sure yet why this is.

Comment 9 Jim R. Wilson 2007-02-16 22:24:06 UTC

Looks like the parsing stops after putting in the <!-- LINK # --> flags.  The
<sup> tags are there for the references, but only <!--LINK #--> can be found inside.

These LINK comments are being generated by the makeLinkHolder() function.  Still
not sure why they're not getting resolved.

Comment 10 Sean J 2007-04-30 18:30:11 UTC

The problem comes from the doBlockLevels() call in parse() which renders lists
and inserts paragraph tags. Tags shouldn't be unstripped until after lists are
rendered but currently they are. So all that needs to happen is to move this
line of code: 
$text = $this->mStripState->unstripGeneral( $text );
to just below the doBlockLevels() call.

The only effect this would have is on gallery tags, but the list functionality
in gallery tags is buggy anyway.  

Does anyone else see any problems with this fix?

Comment 11 Sergey Chernyshev 2007-08-22 19:25:39 UTC

Created attachment 4030 [details]
Parser patch to resolve whitespace mangling of extension output.

Sean J's recommendations seem to work fine. I made new patch for this issue for trunk (works fine on current stable 1.10 branch too).

I'll be happy to see it in the core.

Comment 12 Sergey Chernyshev 2008-06-21 16:45:08 UTC

This seems to be the case in the new (1.12) parser as well.

Comment 13 Chad H. 2008-12-14 20:21:34 UTC

This patch breaks a lot of parser tests:

16 previously passing test(s) now FAILING! :(
 * Parser hook: empty input
 * Parser hook: empty input using terminated empty elements
 * Parser hook: empty input using terminated empty elements (space before)
 * Parser hook: basic input
 * Parser hook: case insensitive
 * Parser hook: case insensitive, redux
 * Parser hook: nested tags
 * Parser hook: basic arguments
 * Parser hook: argument containing a forward slash
 * Parser hook: empty input using terminated empty elements (bug 2374)
 * Parser hook: basic arguments using terminated empty elements
 * Parser hook: static parser hook not inside a comment
 * Parser hook: static parser hook inside a comment
 * Special page transclusion
 * Special page transclusion twice (bug 5021)
 * Gallery

Comment 14 Philipp Spitzer 2008-12-21 15:49:59 UTC

A note about the workaround (replace output with marker in the parser hook - back-replace by content in the ParserAfterTidy) as described in http://www.mediawiki.org/wiki/Manual:Tag_extensions#How_can_I_avoid_modification_of_my_extension.27s_HTML_output.3F causes the final content to be wrapped with <p>. This causes the final output to fail the HTML validation in case the parser hook produces output that is not allowed inside <p>, e.g. <noscript> or <div>.
I did not find a way to prevent this.

Maybe a parameter would be needed where a tag extension can "tell" Mediawiki whether it returns inline or block elements? (or have I overlooked something in the documentation?) Otherwise I cannot find a way how to prevent Mediawiki to place the output in <p>s?

Comment 15 Tim Starling 2009-06-19 15:37:40 UTC


*** This bug has been marked as a duplicate of bug 1319 ***

Note You need to log in before you can comment on or make changes to this bug.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links