Last modified: 2012-01-07 15:37:13 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T33374, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 31374 - Nested refs expose strip markers
Nested refs expose strip markers
Status: RESOLVED FIXED
Product: MediaWiki extensions
Classification: Unclassified
Cite (Other open bugs)
unspecified
All All
: High major with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
: need-integration-test, need-unittest, patch, patch-need-review
Depends on:
Blocks: 29876
  Show dependency treegraph
 
Reported: 2011-10-05 12:26 UTC by Gadget850
Modified: 2012-01-07 15:37 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Possible patch for the issue (925 bytes, patch)
2011-10-05 22:41 UTC, Brad Jorsch
Details
Patch adding parser test case to Cite extension (872 bytes, patch)
2011-10-05 23:20 UTC, Brion Vibber
Details

Description Gadget850 2011-10-05 12:26:59 UTC
After 1.18 upgrade, all uses of nested references using #tag:ref expose the strip markers. Example:

{{#tag:ref|note<ref>reference</ref>|group=Note}}
<references group=Note />
<references />

This will show the strip marker starting with UNIQ.
Comment 1 Gadget850 2011-10-05 12:27:49 UTC
Support for nested references using #tag:ref was added in rev 33066 and has been working with some limitations. This method is documented and well used on the English Wikipedia.
Comment 3 Brad Jorsch 2011-10-05 15:32:50 UTC
I did a binary search through the SVN history to pinpoint the revision that first exhibited the bug. It appears to have been r82645; the bug does not occur in r82644.
Comment 4 Brad Jorsch 2011-10-05 18:25:02 UTC
Ok, I think I see what is happening here.

The parser finds a <ref> tag. It calls the hook registered by Cite.php. Cite.php generates wikitext something like this to output to the page in place of a <ref> tag:

    <sup id="cite_ref-1" class="reference">[[#cite_note-1|<span><nowiki>[</nowiki></span>1<span><nowiki>]</nowiki></span>]]</sup>

It passes this to the parser's recursiveTagParse. The parser returns something like this:

    <sup id="cite_ref-1" class="reference"><a href="#cite_note-1"><span>�UNIQ-1�</span>1<span>�UNIQ-2�</span></a></sup>

Note it replaced the <nowiki> tags with UNIQ markers. Cite.php returns this string from its hook function. The parser stores that as another UNIQ marker (call it �UNIQ-3�), and outputs the UNIQ marker to the page in place of the <ref> tag.

Then, later on, the parser unstrips all the uniq markers for output.

In r82644, does this in a loop: replace all UNIQ markers in the text and repeat until the text stops changing. So in this example, it replaces �UNIQ-3� with text that contains �UNIQ-1� and �UNIQ-2�, and then in the second pass it replaces �UNIQ-1� and �UNIQ-2�, and then in the third pass it makes no changes and stops.

In r82645, it does *not* loop. So in this example, it replaces �UNIQ-3� with text that contains �UNIQ-1� and �UNIQ-2� and stops there, leaving those markers in the page output.


I can think of two possible fixes: have StripState's unstripGeneral(), unstripNowiki(), and unstripBoth() loop as the old version did, or have StripState::addItem() call unstripBoth() on $value before storing it. It also looks like merge() needs to be adjusted to process the saved values, to unstrip them in the latter solution or to rename any existing markers in the former. In the former solution getSubState() also needs to be changed, otherwise it will keep �UNIQ-3� in our example and forget about �UNIQ-1� and �UNIQ-2�.
Comment 5 Roan Kattouw 2011-10-05 20:09:42 UTC
CC Tim.

Tim, could you look into this? See comment 4 for a detailed analysis of the problem.
Comment 6 Jelle Zijlstra 2011-10-05 20:19:05 UTC
This bug does not occur on secure.wikimedia.org (compare https://secure.wikimedia.org/wikipedia/en/wiki/Tropical_Storm_Debra_%281978%29#Footnotes and https://en.wikipedia.org/wiki/Tropical_Storm_Debra_%281978%29#Footnotes), even though Special:Version says both are running MW 1.18wmf1 at r98778.
Comment 7 Roan Kattouw 2011-10-05 20:22:45 UTC
(In reply to comment #6)
> This bug does not occur on secure.wikimedia.org (compare
> https://secure.wikimedia.org/wikipedia/en/wiki/Tropical_Storm_Debra_%281978%29#Footnotes
> and https://en.wikipedia.org/wiki/Tropical_Storm_Debra_%281978%29#Footnotes),
> even though Special:Version says both are running MW 1.18wmf1 at r98778.
That's probably due to parser cache. Does it happen if you edit the page on secure, change nothing, then click Show preview? If yes, then it's just pcache caching an old, uncorrupted version.
Comment 8 Jelle Zijlstra 2011-10-05 20:27:44 UTC
It doesn't happen, actually. I also tried to purge and null-edit the page, and I still don't get any UNIQ...QINU on secure.wikimedia.org.
Comment 9 Gadget850 2011-10-05 20:48:29 UTC
I see it on those pages, as well as https://secure.wikimedia.org/wikipedia/en/wiki/Template:Refn
Comment 10 Jelle Zijlstra 2011-10-05 21:02:18 UTC
It turns out I'm hiding the UNIQ keys with CSS, because they appear in the same place where the square brackets around the key would normally appear. Sorry for the confusion.
Comment 11 Robert Rohde 2011-10-05 21:15:13 UTC
Unstrip markers are also exposed when <ref> tags are missed with other extensions, such as those for <gallery>, e.g.

http://en.wikipedia.org/w/index.php?title=Elizabeth_II&oldid=452751535#Arms

This behavior seems to have the same origin as that described by Brad in comment 4.
Comment 12 U+003F 2011-10-05 21:57:57 UTC
I notice that the bug is (for me) only an issue in firefox, whilst safari and opera display the expected behaviour.
Comment 13 Brad Jorsch 2011-10-05 22:41:25 UTC
Created attachment 9166 [details]
Possible patch for the issue

I've attached a possible patch for this issue. It takes the approach of unstripping $value in StripState::addItem().
Comment 14 MZMcBride 2011-10-05 23:10:06 UTC
I strongly urge rejecting any patches that make the current hacks more acceptable. From memory:

* Nested refs are currently not supported.
* People implemented a hack using {{#tag:ref}} to get around this.
* As expected, a software upgrade has broken this hack.

The solution here is to add proper support for nesting references (bug 16330). Further fiddling with the parser to maintain bad hacks should be avoided at all costs, in my opinion.
Comment 15 Brion Vibber 2011-10-05 23:20:08 UTC
Created attachment 9167 [details]
Patch adding parser test case to Cite extension

Patch takes the example from comment 1 and adds it to Cite's parser test cases. (The bug itself is in core, but none of the core tag hooks appear to do recursive parsing so wouldn't trigger it.)

The output is what I get running against 1.17 (looks fine!), and seems to match fine again with Brad's patch installed.

All other parser test cases also seem to run ok with the patch installed, and it looks ok to me.

Would like a quick check with Tim since this is a fix to his black magic. ;)
Comment 16 Brion Vibber 2011-10-05 23:28:47 UTC
(In reply to comment #14)
> I strongly urge rejecting any patches that make the current hacks more
> acceptable. From memory:
> 
> * Nested refs are currently not supported.
> * People implemented a hack using {{#tag:ref}} to get around this.
> * As expected, a software upgrade has broken this hack.
> 
> The solution here is to add proper support for nesting references (bug 16330).
> Further fiddling with the parser to maintain bad hacks should be avoided at all
> costs, in my opinion.


In theory you should be able to nest as much as you like using {{#tag}}, eg:

{{#tag:ref|blah blah blah{{#tag:ref|blah2}}}}

and indeed this seems to render fine on 1.17.

The main limitation against nesting tags in the <xmlish> tag form is that our parser since way back is oversimplified: it takes the shortest match and doesn't assume ahead of time that the contents would be meaningful wikitext; kinda like how <textarea> and <script> in HTML aren't allowed to contain </textarea> or </script>, because they have character data content models so a nested <textarea> or <script> opener would seem invisible to the HTML parser.


The curly brace syntax however is quite explicitly ok to nest, as the internals are treated like wikitext and get brace expansion pre-done inside before the hook gets them.

What seems to be the problem with that? Is there internals crud in Cite itself that makes nesting situations particularly perilous? Or does it just seem like a poor practice for Cite/ref in particular because it sounds like an odd usage pattern?
Comment 17 Gadget850 2011-10-06 00:44:49 UTC
> In theory you should be able to nest as much as you like using {{#tag}}

And it does indeed work. I even created template refn because editors were messing up the syntax— group and note have to be at the end of the #tag:ref, after content, which is not intuitive.

However, when using list-defined references, you can only include one nested ref. More throws a "Cite error references no key". See Bug 20707.

This is a real PITA, and has resulted in the proliferation of templates that do not use cite.php, but result in duplicate HTML ids and invalid HTML output.

Yes, rewriting the parser to fix this would be a really good thing. But we should at least get back to the point where we mostly worked.
Comment 18 Tim Starling 2011-10-06 01:25:45 UTC
Should be fixed now (r99062, r99071).
Comment 19 Gadget850 2011-10-06 01:36:00 UTC
Looks good. Back to mostly working state. I have updated the VPT list.

Thanks!
Comment 20 MZMcBride 2011-10-06 02:15:26 UTC
(In reply to comment #16)
> (In reply to comment #14)
>> I strongly urge rejecting any patches that make the current hacks more
>> acceptable. From memory:
>> 
>> * Nested refs are currently not supported.
>> * People implemented a hack using {{#tag:ref}} to get around this.
>> * As expected, a software upgrade has broken this hack.
>> 
>> The solution here is to add proper support for nesting references (bug 16330).
>> Further fiddling with the parser to maintain bad hacks should be avoided at all
>> costs, in my opinion.
> 
> 
> In theory you should be able to nest as much as you like using {{#tag}}, eg:
> 
> {{#tag:ref|blah blah blah{{#tag:ref|blah2}}}}
> 
> and indeed this seems to render fine on 1.17.
> 
> The main limitation against nesting tags in the <xmlish> tag form is that our
> parser since way back is oversimplified: it takes the shortest match and
> doesn't assume ahead of time that the contents would be meaningful wikitext;
> kinda like how <textarea> and <script> in HTML aren't allowed to contain
> </textarea> or </script>, because they have character data content models so a
> nested <textarea> or <script> opener would seem invisible to the HTML parser.
> 
> 
> The curly brace syntax however is quite explicitly ok to nest, as the internals
> are treated like wikitext and get brace expansion pre-done inside before the
> hook gets them.
> 
> What seems to be the problem with that? Is there internals crud in Cite itself
> that makes nesting situations particularly perilous? Or does it just seem like
> a poor practice for Cite/ref in particular because it sounds like an odd usage
> pattern?

Putting the {{{{{{{{{}}}}}}}}} madness issues aside, the use-case isn't {{#tag:ref|{{#tag:ref}}}}, it's {{#tag:ref|<ref></ref>}} (see comment 0). Accommodating (and encouraging) syntax like this is exactly what got the parser into the shape it's in now. Haphazard hacks get ingrained into the code and then suddenly everyone else is expected to be able to deal with them, rather than implementing a proper solution when the issues are obvious. That was largely my point, but it seems to be rather moot now.

A considered decision to change or update ref syntax would be a Good Thing. Implementing "solutions" like this that put a band-aid on the problem and make it more difficult to resolve issues down the line is a Bad Thing.
Comment 21 Tim Starling 2011-10-06 04:27:45 UTC
(In reply to comment #20)
> Putting the {{{{{{{{{}}}}}}}}} madness issues aside, the use-case isn't
> {{#tag:ref|{{#tag:ref}}}}, it's {{#tag:ref|<ref></ref>}} (see comment 0).
> Accommodating (and encouraging) syntax like this is exactly what got the parser
> into the shape it's in now. Haphazard hacks get ingrained into the code and
> then suddenly everyone else is expected to be able to deal with them, rather
> than implementing a proper solution when the issues are obvious. That was
> largely my point, but it seems to be rather moot now.

Nested tags have always been allowed, I fixed a related bug as early as r2880. A bug with nested tags does not indicate that there is a problem with #tag, since there are plenty of other ways to do nested tags. If you have some deeper problem with #tag, I suggest you complain on another bug report.
Comment 22 MZMcBride 2011-10-06 05:45:42 UTC
(In reply to comment #21)
> (In reply to comment #20)
>> Putting the {{{{{{{{{}}}}}}}}} madness issues aside, the use-case isn't
>> {{#tag:ref|{{#tag:ref}}}}, it's {{#tag:ref|<ref></ref>}} (see comment 0).
>> Accommodating (and encouraging) syntax like this is exactly what got the parser
>> into the shape it's in now. Haphazard hacks get ingrained into the code and
>> then suddenly everyone else is expected to be able to deal with them, rather
>> than implementing a proper solution when the issues are obvious. That was
>> largely my point, but it seems to be rather moot now.
> 
> Nested tags have always been allowed, I fixed a related bug as early as r2880.
> A bug with nested tags does not indicate that there is a problem with #tag,
> since there are plenty of other ways to do nested tags. If you have some deeper
> problem with #tag, I suggest you complain on another bug report.

* <ref> is implemented.
* {{#tag:}} is implemented.
* Users decide to combine the two into new creations.
* MediaWiki upgrade to 1.18 breaks these (Frankensteinian) creations.
* Bug is filed documenting exposed strip markers with specific nested markup.
* [decision whether to properly support nested refs] and/or [decision whether to properly support mixed syntax]

(From comment #20)
> A considered decision to change or update ref syntax would be a Good Thing.
> Implementing "solutions" like this that put a band-aid on the problem and make
> it more difficult to resolve issues down the line is a Bad Thing.

A lack of considered decisions regarding syntax and code structure is what got MediaWiki its current parser. That was my point, buried somewhere in there. Apologies if I hijacked the bug discussion.
Comment 23 Tim Starling 2011-10-06 12:06:38 UTC
(In reply to comment #22)
> * <ref> is implemented.
> * {{#tag:}} is implemented.
> * Users decide to combine the two into new creations.
> * MediaWiki upgrade to 1.18 breaks these (Frankensteinian) creations.
> * Bug is filed documenting exposed strip markers with specific nested markup.
> * [decision whether to properly support nested refs] and/or [decision whether
> to properly support mixed syntax]

The point of #tag is to provide access to legacy tag hooks via the newer, more flexible brace syntax. This flexibility allows users to do things which are otherwise difficult, such as nesting ref tags. It's not "Frankensteinian", it's an intended use-case.

As far as I'm concerned, xmlish tags should be deprecated, since backwards compatibility puts constraints on what they can do. Adding even more syntax variations to the grammar for xmlish tags will break compatibility with existing text and make life for WYSIWYG implementors more difficult.

All nested xmlish tags were broken, not just ref tags, so the users who decided to nest ref tags were not at fault for this bug in any way.
Comment 24 Gadget850 2011-10-06 13:23:24 UTC
We should now focus on Bug 20707, which will probably need parser work. I have made further comments there.
Comment 25 Brad Jorsch 2011-10-06 14:24:56 UTC
(In reply to comment #22)
> * <ref> is implemented.
> * {{#tag:}} is implemented.
> * Users decide to combine the two into new creations.
> * MediaWiki upgrade to 1.18 breaks these (Frankensteinian) creations.
> * Bug is filed documenting exposed strip markers with specific nested markup.
> * [decision whether to properly support nested refs] and/or [decision whether
> to properly support mixed syntax]

Is {{#tag:ref|foo{{#tag:ref|bar}}}} a "Frankensteinian creation" or "mixed syntax" in your mind too? Because it failed in exactly the same way as {{#tag:ref|foo<ref>bar</ref>}}.
Comment 26 Mark A. Hershberger 2011-10-15 22:03:24 UTC
tagging bugs for Marcus to look at
Comment 27 Jos Visser 2012-01-07 15:04:20 UTC
I just upgraded to 1.18 and I got a problem like this when I use a refence within a poem tag ( using the poem extension at http://www.mediawiki.org/wiki/Extension:Poem ).

 <poem>"Some text" <ref name="foo"/></poem>

Changing it to 

 {{#tag:poem|"Some text.{{#tag:ref||name="foo"}}"}}

doesn't change a thing, the result still looks like:

 "Some text.UNIQ76f58a4e1b6f37c4-nowiki-00000003-QINU1UNIQ76f58a4e1b6f37c4-nowiki-00000004-QINU"

I'm not nesting any ref tags.
Comment 28 Jos Visser 2012-01-07 15:37:13 UTC
Nevermind. I altered unstripType to do a recursive replacement. 

Then I saw the fix by Tim Starling in Comment 18, which does the same (but in a loop, which is better than recursive). I've overlooked that comment before, and apparently his fix hasn't made it to the stable version yet.. Sorry for the inconvienience.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links