Last modified: 2010-05-15 15:48:31 UTC
The parser seems to apply paragraph and pre-formating rules to the html returned from parser hooks. But output from parser hooks should be completely left alone. To see the effect, consider the following parser hook: function wfRenderFuckup( $input, $argv, &$parser ) { return 'a b c d e f gh '; } the following in wikitext: xx<fuckup/>xx produces this html: <p>xxa b </p><p>c d </p> <pre> e f </pre> <p>gh xx </p> Disabling tidy does not have any effect on this. Tested with current HEAD. This is extremely bad for parser hooks that try to output html or JS code loaded from some file, where preformated (human readable) code may be found.
Created attachment 3230 [details] code snippet with parser hook bah, bugzilla mangeled the test code. here's an attachment.
Created attachment 3231 [details] more test cases wow, more fun here. It seems like full wiki formating for lists and indenting is applied to the parser hook output. Also, blanks before : are turned into for some reason. HTML output for this test case: <p>a: b a : b </p> <dl><dd> bla </dd><dt> bla </dt><dd> blubb </dd></dl> <ul><li> foo </li><li> bar </li></ul> <ol><li> foo </li><li> bar </li></ol> This is really really broken...
Isn't this just because parser-hook text is substituted in on an early pass and then left to the mercies of the rest of the parser? There should be some option to <nowiki> it all, presumably. Or . . . you could just add <nowiki> yourself? (Well, that doesn't stop paragraph breaks, so something intermediate between <nowiki> and <pre> would be needed.)
Well, as far as I see, not exactly an "early" pass, but *too* early: but the output of the hook is put pack into the text before the "final block-level transformation" (or something). Nowiki processing was already applied at this stage, so the <nowiki> would be passed through verbatim - it also wouldn't work because nowiki would *escape* any HTML tags. What would really be needed is to unstrip results of parser hooks only after *all* transforms are done. But the parser being as convoluted as it is, I have no idea how to do about this. Also, this behavior has apparently been around for quite some time - so some parser hooks may even rely on it (would be a bad idea, but should be checked). Anyway, Brion told me to "not treat this as an emergency", so I'll set this back to normal severity (even though I feel the current behavior is severely broken). Also, people seem to remember that this problem has already be files - I can't find the report though. If you do, feel free to dupe this.
Hm, from experimenting with an extension I wrote, it seems to me that the right place to "put back" results from a parser hook would be where the ParserAfterTidy hook is called (or, if tidy should process it, ParserBeforeTidy). To achieve this, the strip/unstrip mechanism should probably be rigged so it can be specified what is unstripped when. I'm not clear on how to do that, though.
I will be attaching a Parser patch shortly which hides extension tag output until after Tidy is finished. One caveat, this patch causes two additional test failures that previously worked: > Running test Parser hook: static parser hook not inside a comment... FAILED! > Running test Parser hook: static parser hook inside a comment... FAILED! The former expects "<p>hello, world\n</p>" and gets "<p>\nhello, world\n</p>" The latter expects "<p><br />\n</p>" and gets "<p>\n</p>" In both cases the differences are whitespace only.
Created attachment 3235 [details] Parser patch to resolve whitespace mangling of extension output. Note comments in bug regarding test failures as a result of applying this patch.
Previous patch seems to be breaking Cite. Specifically, <ref> tags are not being replaced by "[1]" links - though they still render properly in the <references> section. Not sure yet why this is.
Looks like the parsing stops after putting in the <!-- LINK # --> flags. The <sup> tags are there for the references, but only <!--LINK #--> can be found inside. These LINK comments are being generated by the makeLinkHolder() function. Still not sure why they're not getting resolved.
The problem comes from the doBlockLevels() call in parse() which renders lists and inserts paragraph tags. Tags shouldn't be unstripped until after lists are rendered but currently they are. So all that needs to happen is to move this line of code: $text = $this->mStripState->unstripGeneral( $text ); to just below the doBlockLevels() call. The only effect this would have is on gallery tags, but the list functionality in gallery tags is buggy anyway. Does anyone else see any problems with this fix?
Created attachment 4030 [details] Parser patch to resolve whitespace mangling of extension output. Sean J's recommendations seem to work fine. I made new patch for this issue for trunk (works fine on current stable 1.10 branch too). I'll be happy to see it in the core.
This seems to be the case in the new (1.12) parser as well.
This patch breaks a lot of parser tests: 16 previously passing test(s) now FAILING! :( * Parser hook: empty input * Parser hook: empty input using terminated empty elements * Parser hook: empty input using terminated empty elements (space before) * Parser hook: basic input * Parser hook: case insensitive * Parser hook: case insensitive, redux * Parser hook: nested tags * Parser hook: basic arguments * Parser hook: argument containing a forward slash * Parser hook: empty input using terminated empty elements (bug 2374) * Parser hook: basic arguments using terminated empty elements * Parser hook: static parser hook not inside a comment * Parser hook: static parser hook inside a comment * Special page transclusion * Special page transclusion twice (bug 5021) * Gallery
A note about the workaround (replace output with marker in the parser hook - back-replace by content in the ParserAfterTidy) as described in http://www.mediawiki.org/wiki/Manual:Tag_extensions#How_can_I_avoid_modification_of_my_extension.27s_HTML_output.3F causes the final content to be wrapped with <p>. This causes the final output to fail the HTML validation in case the parser hook produces output that is not allowed inside <p>, e.g. <noscript> or <div>. I did not find a way to prevent this. Maybe a parameter would be needed where a tag extension can "tell" Mediawiki whether it returns inline or block elements? (or have I overlooked something in the documentation?) Otherwise I cannot find a way how to prevent Mediawiki to place the output in <p>s?
*** This bug has been marked as a duplicate of bug 1319 ***