Last modified: 2014-07-24 21:57:23 UTC
http://parsoid.wmflabs.org/_rt/de/Selbstbildnis_%28Leonardo_da_Vinci%29 gallery caption becomes gallery caption=""
http://parsoid.wmflabs.org/_rt/de/Iraklis_Thessaloniki bgcolor= becomes bgcolor (which _might_ mean another edit causes the 'bgcolor'-> 'bgcolor=""' in the next parse)
bgcolor="" should either be left alone or should be deleted, as bgcolor In a table, bgcolor="" produces no background color but bgcolor produces a dark red background because it produces the html <td bgcolor="bgcolor"> (is this a bug in the default parser?) See https://en.wikipedia.org/w/index.php?title=User:Thryduulf/sandbox&oldid=566470353#Table This causes problems on the live wiki, see https://en.wikipedia.org/w/index.php?title=Kyle_Busch&diff=566076453&oldid=566076367 (the relevant change is the lines before the Line 709 diff block). Accordingly I've upgraded the severity from "trivial" to "normal".
I've reported that parsing error as bug 52330 in the mediawiki parsing component as the example table in my sandbox was generated in the source editor and so had no involvement from parsoid aiui.
I guess that bgcolor="" and bgcolor= should round-trip to bgcolor="" rather than just 'bgcolor'. Stripping the attribute completely does not seem to be a good solution in general.
Yes. As noted at bug 52330 "bgcolor" generates the html "bgcolor="bgcolor" " which renders (at least in Firefox) the same as "bgcolor="#b00000" " rather than the expected "#f9f9f9" that is the default for tables of class "wikitable"
This is what a modern browser (HTML5 parsing spec) does: document.body.innerHTML = '<div bgcolor>foo</div>'; "<div bgcolor>foo</div>" document.body.innerHTML "<div bgcolor="">foo</div>" We do the same in parsoid as we are also using the HTML5 parsing algorithm. So I think bug 52330 is really the issue here. We should already be round-tripping any kind of attribute perfectly in untouched content. The normalization to bgcolor="" should only happen when something nearby was edited. Can you verify using the visual editor? PS: When trying http://parsoid.wmflabs.org/_rtselser/dewiki/Selbstbildnis_%28Leonardo_da_Vinci%29 I noticed that there is a diff in ref tags which should not be there. This is reported in bug 60120.
So, Parsoid treats HTML and extension attributes slightly differently. See snippets below: [subbu@earth lib] echo "A<ref name=>a</ref> B<ref name="">b</ref> C<ref name>c</ref> D<ref name='d'>d</ref>" | node parse --wt2wt A<ref>a</ref> B<ref>b</ref> C<ref>c</ref> D<ref name="d">d</ref> [subbu@earth lib] echo "<span title=>a</span><span title="">b</span><span title>c</span><span title="d">d</span>" | node parse --wt2wt <span title="">a</span><span title="">b</span><span title="">c</span><span title="d">d</span> For extensions, it drops empty attributes (in whatever form they show up), and for HTML tags, it normalizes empty attributes. Our HTML attribute behavior conforms with what browsers do. Anything to change / fix here for extensions?