Last modified: 2012-08-22 17:59:56 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T41181, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 39181 - Self-closing tags without trailing slash do not round-trip correctly
Self-closing tags without trailing slash do not round-trip correctly
Status: RESOLVED FIXED
Product: Parsoid
Classification: Unclassified
serializer (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Gabriel Wicke
:
Depends on:
Blocks: 39567
  Show dependency treegraph
 
Reported: 2012-08-08 22:50 UTC by Roan Kattouw
Modified: 2012-08-22 17:59 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Roan Kattouw 2012-08-08 22:50:49 UTC
Go to http://localhost:8000/_rtform/ and enter "<br>"; it'll round-trip to "<br />"
Comment 1 Gabriel Wicke 2012-08-09 00:25:29 UTC
Fix pushed for review in https://gerrit.wikimedia.org/r/#/c/18202/
Comment 2 Mark Holmquist 2012-08-10 20:02:55 UTC
This fix was merged, but there were notes about '<hr>' and '<img>' as well. I tried to get those to work, but something happens to '<img>' tags before they get to the serializer, and I'm not entirely sure where.
Comment 3 Gabriel Wicke 2012-08-10 20:56:54 UTC
'img' tags are not normally allowed in wikitext, and are stripped out by the sanitizer. 'hr' tags should be allowed, just need to be added to the list of empty tags.
Comment 4 Gabriel Wicke 2012-08-10 22:10:23 UTC
Complete list of void elements submitted for review in https://gerrit.wikimedia.org/r/18697. Verified to preserve slash for hr as well.
Comment 5 Mark Holmquist 2012-08-14 17:26:07 UTC
Boom, merged. Gettin' it done.
Comment 6 Roan Kattouw 2012-08-22 00:01:43 UTC
This is not quite fixed yet, because now "<br />" round-trips to "<br/>", see http://parsoid.wmflabs.org/_rt/mw/VisualEditor:Roadmap
Comment 7 Gabriel Wicke 2012-08-22 01:44:18 UTC
We generally don't try to preserve insignificant whitespace inside HTML tags yet, and it is debatable if this would be worth it. The main reason for going to the length we currently do for full-serialization round-tripping is to simplify our own testing (so that we can use a simple text diff). In the longer term, selective serialization based on the editor's change annotations should provide the better bang for the buck. We might well remove some of the elaborate round-trip information we currently collect once that is enabled and a way to systematically ignore some rt differences is found.
Comment 8 ssastry 2012-08-22 15:19:12 UTC
This insignificant whitespace issue is broader than self-closing tags.  Worth closing this ticket and opening a new one to discuss that.

For example, try this:

"<div    style='color:red;'>foo</div>"

The excess white space is lost.  While I am inclined to dismiss this issues as insignificant and as a case where dirty diffs are acceptable, I am also sensing the tension in the air around WMF projects.  So, maybe this is not such an insignificant issue after all.  Using selective serialization, we can for sure address all these newline, whitespace, etc. issues, and it is worth discussing how soon we should enable that.  As Gabriel noted, it will also eliminate a lot of the wikitext state we carry around in the DOM.

But, the serializer also ought to be complete in and of itself, and so this question will rear its head in that context again.

My personal inclination is to ignore this issue because supporting this consistently will mean essentially recognizing non-normalized-whitespace/newline scenarios and carrying diff-state in the DOM.  But, we will probably be doing this only to satisfy a requirement of perfect round-tripping which IMO is unreasonable/unrealistic.

This might be one issue where it is worth bringing in the editor community into the discussion so we can make a more reasonable decision that everying can be happy with.
Comment 9 Gabriel Wicke 2012-08-22 15:32:02 UTC
We need clean diffs, but there are different ways to get them, with different cost/benefit ratios. Tracking all semantically insignificant whitespace in rt information looks like a sure way to add a lot of bloat. Selective serialization of modified DOM parts on the other hand seems to offer a better cost/benefit ratio.

The only downside of selective serialization I see is that it adds a burden on clients to mark modified parts of the DOM.
Comment 10 Gabriel Wicke 2012-08-22 17:24:44 UTC
Closing this bug for now, semantically insignificant whitespace is tracked by bug 39564.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links