Last modified: 2013-12-03 23:50:28 UTC
We currently include purely syntactic whitespace in the DOM, which makes life for VE and other clients harder than necessary. Instead, we should abstract purely syntactic whitespace and match the PHP parser's output. Test cases: == Foo == should parse to <h2>Foo</h2> instead of <h2> Foo </h2> * foo should parse to <ul><li>foo</li></ul> instead of <ul><li> foo</li></ul>
Isn't this a more generic problem that is not limited to lists and headings? It seems we should trim whitespace from all first/last child text nodes of all non-pre elements. Otherwise, it doesn't really benefit VE, for example, since they would still have to maintain whitespace information and restore it on save. This normalization will then mean only selser will be able to reserialize content without introducing dirty diffs. If we want regular serializer to preserve whitespace, then, we have to record details of normalized whitespace in data-parsoid.
https://gerrit.wikimedia.org/r/#/c/96790/ did some related work in the serializer, but did not change the DOM representation yet.