Last modified: 2014-09-24 00:46:30 UTC
I made an extension to allow for easier discussions (an example is here http://moacad.com/wiki/index.php?title=Talk:Changelog) but in doing so I noticed that the current extractTags function in Parser.php would extract the tag starting at the beginning of the tag (<tag>) and would stop at the 'first' end tag (</tag>). For example if someone edited a page and included the text '<tag><tag>foo</tag>bar</tag>' it would think the tag was only '<tag><tag>foo</tag>' and would leave off the extra '</tag>' on the end.
Created attachment 201 [details] patch for the Parser.php file, with line numbers I'm not sure how to include the code for html comments with it so I just put that separate with an if tag. If anyone would like to change it go ahead. It may also be sloppy formating or code, this is my first patch so I don't know much about the style it should be in.
Created attachment 202 [details] more recent patch I think this should be more usefull. It replaces lines 234-285 in the 1.4beta4 version of Parser.php
Not sure this would be desireable; may have side effects. Anyway the patch is very out of date...
Not a patch, removing patch keyword.
Isn't this bug fixed in 1.9 and maybe earlier with Parser::recursiveTagParse()?
Created attachment 4140 [details] diff file from mediawiki 1.11.0
*** Bug 11528 has been marked as a duplicate of this bug. ***
*** Bug 20350 has been marked as a duplicate of this bug. ***
*** Bug 21426 has been marked as a duplicate of this bug. ***
Removing need-review keyword. Patches are ancient and not very useful, so I've marked them obsolete.
This bug still exist in 1.16.2 - any plans to fix it?
Probably exists in 1.17 (about to be released), too. Can you check out a copy of HEAD from subversion to check? svn checkout http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3 wiki Also, if you're running into this, let us know the particulars of your case.
/me wonders if the proposed change in this bug is really the desired behaviour. Disallowing nested start <tag>'s seems sane to me.
(In reply to comment #13) > /me wonders if the proposed change in this bug is really the desired behaviour. > Disallowing nested start <tag>'s seems sane to me. That depends on semantics and the code handling them.
Working with the latest code from trunk/phase3, as suggested. My test case extension is a very basic tag hook: File: Bug1310_TestCase.php <?php $wgHooks['ParserFirstCallInit'][] = 'onParserFirstCallInit'; function onParserFirstCallInit( &$parser ) { $parser->setHook( 'foo', 'onTag' ); return true; } function onTag( $input, $args, $parser, $frame ) { wfDebug( $input ); return 'xxx'; } ?> in LocalSettings.php the extension is loaded the normal way. The input for the test case is: '<foo>Begin1... <foo>Begin2... ...End2</foo> ...End2</foo>' The $input that gets into onTag() should be: 'Begin1... <foo>Begin2... ...End2</foo> ...End2' However, In wfDebug I get: 'Begin1... <foo>Begin2... ...End2' And in the browser I get: 'xxx...End1</foo>'
OOPS! (why can't I edit my comment?) The input for the test case is: '<foo>Begin1... <foo>Begin2... ...End2</foo> ...End1</foo>' The $input that gets into onTag() should be: 'Begin1... <foo>Begin2... ...End2</foo> ...End1' However, In wfDebug I get: 'Begin1... <foo>Begin2... ...End2' And in the browser I get: 'xxx...End1</foo>'
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
-patch, no non-obselete patches.
Same bug here. I'm developing an extension to semantically tag the document, so there are going to be recursive tags, and I'm getting this bug too, test case is the same as in Comment 15. Any alternative to tag extensions for this?
>Any alternative to tag extensions for this? Well parser function style things perhaps ({{#foo:...}})
Created attachment 9920 [details] Allows extension tags to be nested This patch allows tag extensions to be nested. Only the most outer tags are parsed, everything in between is passed to the callback. Given the wiki text "<foo>123<foo>456</foo>789</foo>", foo's callback will be called with the text "123<foo>456</foo>789".
Thanks for this patch! We've been in a code slush (not quite a freeze) for a few weeks so we're just getting around to looking at these. We're also doing a lot of parser work, so I'm not sure how relevant this is, but I'll ask them to take a look.
From an implementation standpoint, simply matching up the closest start/end tag is definitely easier than building a stack to enable nested tag pairs. I am also not convinced that nested tag pairs would be a good UI design, as it seems to make the distinction of regular wiki content and input to an extension harder than necessary. Could you present a compelling use case that demonstrates the need to use the same tags both to delimit the extension inputs and the input itself?
The last sentence should naturally end with *and in the input itself*. An edit button would be handy sometimes.
(In reply to comment #23) > From an implementation standpoint, simply matching up the closest start/end tag > is definitely easier than building a stack to enable nested tag pairs. I am > also not convinced that nested tag pairs would be a good UI design, as it seems > to make the distinction of regular wiki content and input to an extension > harder than necessary. > > Could you present a compelling use case that demonstrates the need to use the > same tags both to delimit the extension inputs and the input itself? There are two examples I could see where this may be wanted * <ref> tags so people could do nested ref stuff without {{#tag:ref hackery. * <source> tag's for when highlighting xml-ish things that have a <source> inside them (since they would usually have a closing source tag as well, but that's more like accidentally fixing an issue then actually fixing an issue). But I also tend to agree that it may not be worth the effort.
That as a <source> fix sounds to me more like a hack to fix a non-issue to me. The <source> isn't written to explicitly do anything special with any <source> tags inside of it so that does not sound like the thing we should be aiming for. (And sounds like it would break if someone used <source> to document example arguments to the opening <source> tag) Switching over to a complete even tag matching could change the behaviour of existing content -- ie: <foo><foo></foo> suddenly having different behaviour -- so I'd reject the patch we have on those grounds alone. We probably also want to write a test to make sure that <nowiki><nowiki></nowiki> doesn't suddenly start turning everything after it into nowiki content when it was written expecting it to display a "<nowiki>" tag verbatim in the page for documentation. I think that if we do implement recursive tags, it's going to have to be an explicit op-in by extensions feature. ie: Only tags with a specific option will be parsed recursively. We can enable it on <ref> but we may not want to enable it for <source>, and definitely don't want it on <nowiki>.
I also share Daniel's concerns about changing the behavior of existing content. In the longer term, I think that it would be desirable to add a fully parsed input mode for extension tag contents. This could take the form of a token stream or a DOM fragment built from those. Extensions could choose between plain-text input and tokens, so this would be opt-in. This is also very close to how this is currently handled in the Parsoid parser (http://mediawiki.org/wiki/Parsoid), although there still are some issues to solve (e.g. an unclosed html comment in the extension content). Nested nowiki is covered by parser tests already, and works as expected. Are there extensions that need nested extension tags, but otherwise unparsed input?
*** Bug 35173 has been marked as a duplicate of this bug. ***
This discussion has been stalled during more than two years. Has there been any change helping to its resolution in a way or another?
(In reply to Quim Gil from comment #29) > This discussion has been stalled during more than two years. Has there been > any change helping to its resolution in a way or another? Had there been, this bug would likely already reflect such a change. I'm not sure what you're asking.