Last modified: 2013-07-04 10:34:15 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T43057, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 41057 - DSR on templates in tables seems to be off
DSR on templates in tables seems to be off
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
General (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: ssastry
:
Depends on:
Blocks: 39567
  Show dependency treegraph
 
Reported: 2012-10-15 23:14 UTC by Gabriel Wicke
Modified: 2013-07-04 10:34 UTC (History)
2 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Gabriel Wicke 2012-10-15 23:14:22 UTC
DSR expands beyond the tsr for the template element, which causes it to include parts of the table syntax. Test case on the English Wikipedia:

{| class="toccolours collapsible collapsed" style="width:100%;"
|-
! '''Links to related articles'''
|-
| {{BBC Television}}
|-
| {{BBC Radio}} 
|-
| {{BBC News}}
|-
| {{BBC music ensembles}}
|-
| {{European Broadcasting Union Members}}
|-
| {{Department for Culture, Media and Sport}}
|-
| {{Media in the United Kingdom|comporg}}
|-
| {{Telecommunications}}
|}
Comment 1 ssastry 2012-10-15 23:20:40 UTC
Reduced test case:
{|
| {{BBC Radio}}
|}
Comment 2 ssastry 2012-10-16 04:20:05 UTC
This is quite ugly actually.  Details below.

The opening content generated by this template is:

<td style="text-align:left;border-left-width:2px;border-left-style:solid;width:100%;padding:0px;;;" class="navbox-list navbox-oddhlist">

Note the <td ... > tag parsed as text.  Now, if you look at the source of {{BBC Radio}} template, it uses the {{Navbox}} template.  And, if you look at the content of the {{Navbox}} template, you see the source of all problems.

The td-tag above that is parsed as string is actually generated in multiple pieces as string and pieced together which means the tokenizer cannot really parse this as td-tag without processing all the associated templates and generating the strings and concatenating them.

-----------------------------------------
{{#if:{{{group1|}}}|<th scope="row" class="navbox-group {{{groupclass|}}}" <!--
 -->style="{{{basestyle|}}};{{#if:{{{groupwidth|}}}|width:{{{groupwidth}}};}}{{{groupstyle|}}};{{{group1style|}}}"><!--
 -->{{{group1}}}</th><td style="text-align:left;border-left-width:2px;border-left-style:solid;|<td colspan=2 style="}}<!--
 -->{{#if:{{{groupwidth|}}}||width:100%;}}padding:0px;{{{liststyle|}}};{{{oddstyle|}}};{{{list1style|}}}" <!--
 -->class="navbox-list navbox-{{#ifeq:{{{evenodd|}}}|swap|even|{{{evenodd|odd}}}}} {{{listclass|}}}">=
-----------------------------------------

Hmmm .... 

{{Navbox}} is actually a very common template.  So, we need to find a hack for this or fix the Navbox template.
Comment 3 Gabriel Wicke 2012-10-16 04:29:15 UTC
Normally the template's output should not affect dsr calculations. The template tag has the correct tsr available, so I think this is simply a bug in the dsr propagation algorithm.

The navbox template is a known issue, and can be fixed by tweaking the template to use properly nested (at least to some degree..) parser functions.
Comment 4 ssastry 2012-10-16 15:49:09 UTC
I think because of the lost <td> tag, the DOM is thrown off sufficiently so that DSR calculations might not be able to capture all RT-able content.  Consider this example:

{|
|{{BBC Radio}}
|row2
|}

Now, let us look at the token stream that is sent to the tree builder: 
<table>, <td>, <bbc-meta-start>, ...., <bbc-meta-end>, </td>, <td>, row2, </td>, </table>

But, if you look at the DOM that comes out the tree-builder, it looks like:
.... </table>, row2.  So, a closing </td> tag that should have matched with the lost td-tag is instead closing the outer table.  In the bargain, source wikitext information about the leading "|" before row2 and the table closing "|}" is lost because the tree builder discards the mismatched <td>, </td> and </table> tokens in the end.

This rts as:

{|
|{{BBC Radio}} 
row2

Now, let us take an example closer to the wikitext found on the BBC page:
{|
|{{BBC Radio}}
|{{echo|row3}}
|}

The trailing DOM for this looks like this:
...
</table>
<span data-parsoid="{'src':'|{{echo|row2}}','dsr':[18,32]}" about="#mwt2" typeof="mw:Object/Template">row2</span>

And, this roundtrips as:

{|
|{{BBC Radio}}
|{{echo|row2}}

I'll examine this more, but when the DOM semantics change, I am not sure if DSR can always robustly fix this up in all situations given that DSR computation relies on DOM structure.
Comment 5 ssastry 2012-10-16 21:08:37 UTC
For the benefit of anyone else besides me and Gabriel who is monitoring this ticket :), we decided to try and patch the DOM to deal with lost content.  The algorithm will patch the DOM by expanding template scope to patch over problem content so that the entire content (in this case the entire wikitext table) can roundtrip.  If we manage to patch up the DOM, DSR computation will remain unchanged and will mark the entire expanded content RT-able.

We may have a solution for detecting when we should (and can) do this DOM patchup.  Coding and testing in progress.
Comment 6 ssastry 2012-10-18 15:04:27 UTC
A DOM fix here: https://gerrit.wikimedia.org/r/#/c/28455
Comment 7 Andre Klapper 2013-07-04 10:34:15 UTC
[Parsoid component reorg by merging JS/General and General. See bug 50685 for more information. Filter bugmail on this comment. parsoidreorg20130704]

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links