Last modified: 2012-08-22 17:59:56 UTC
Go to http://localhost:8000/_rtform/ and enter "<br>"; it'll round-trip to "<br />"
Fix pushed for review in https://gerrit.wikimedia.org/r/#/c/18202/
This fix was merged, but there were notes about '<hr>' and '<img>' as well. I tried to get those to work, but something happens to '<img>' tags before they get to the serializer, and I'm not entirely sure where.
'img' tags are not normally allowed in wikitext, and are stripped out by the sanitizer. 'hr' tags should be allowed, just need to be added to the list of empty tags.
Complete list of void elements submitted for review in https://gerrit.wikimedia.org/r/18697. Verified to preserve slash for hr as well.
Boom, merged. Gettin' it done.
This is not quite fixed yet, because now "<br />" round-trips to "<br/>", see http://parsoid.wmflabs.org/_rt/mw/VisualEditor:Roadmap
We generally don't try to preserve insignificant whitespace inside HTML tags yet, and it is debatable if this would be worth it. The main reason for going to the length we currently do for full-serialization round-tripping is to simplify our own testing (so that we can use a simple text diff). In the longer term, selective serialization based on the editor's change annotations should provide the better bang for the buck. We might well remove some of the elaborate round-trip information we currently collect once that is enabled and a way to systematically ignore some rt differences is found.
This insignificant whitespace issue is broader than self-closing tags. Worth closing this ticket and opening a new one to discuss that. For example, try this: "<div style='color:red;'>foo</div>" The excess white space is lost. While I am inclined to dismiss this issues as insignificant and as a case where dirty diffs are acceptable, I am also sensing the tension in the air around WMF projects. So, maybe this is not such an insignificant issue after all. Using selective serialization, we can for sure address all these newline, whitespace, etc. issues, and it is worth discussing how soon we should enable that. As Gabriel noted, it will also eliminate a lot of the wikitext state we carry around in the DOM. But, the serializer also ought to be complete in and of itself, and so this question will rear its head in that context again. My personal inclination is to ignore this issue because supporting this consistently will mean essentially recognizing non-normalized-whitespace/newline scenarios and carrying diff-state in the DOM. But, we will probably be doing this only to satisfy a requirement of perfect round-tripping which IMO is unreasonable/unrealistic. This might be one issue where it is worth bringing in the editor community into the discussion so we can make a more reasonable decision that everying can be happy with.
We need clean diffs, but there are different ways to get them, with different cost/benefit ratios. Tracking all semantically insignificant whitespace in rt information looks like a sure way to add a lot of bloat. Selective serialization of modified DOM parts on the other hand seems to offer a better cost/benefit ratio. The only downside of selective serialization I see is that it adds a burden on clients to mark modified parts of the DOM.
Closing this bug for now, semantically insignificant whitespace is tracked by bug 39564.