Last modified: 2014-07-31 10:35:25 UTC
Take this HTML: <ul><li>asd sdf</li></ul> Parse to wikitext: * asd sdf Parse back to HTML: <ul><li>asd</li></ul> <p>sdf</p> I'm not sure what should happen here, but definitely not this. It's rather easy to run into this in VisualEditor – take a paragraph with newlines and convert it to a list item. I ran into it making this edit: https://en.wikipedia.org/w/index.php?title=Polish_nationality_law&diff=prev&oldid=618961884 (I manually replaced the newlines with spaces before saving).
Yes, this is a known issue. Parsoid currently cannot handle arbitrary HTML and convert it to wikitext in a way that preserves rendering on the html -> wt -> html path. But, we've talked about this issue more generally in the past and will address it including fallback mechanisms where some forms of HTML will have to get serialized as HTML tags rather than native wikitext. I thought we had a tracking or related set of bugs for this but, cannot find it right now. We should identify any other related breakages that arise from within VE (which doesn't necessarily generate arbitrary HTML) and fix them together in Parsoid. This fix would be simpler than support for generic HTML->wt conversion that is preserved in a html2html transformation.
In cases like this, it would probably be reasonable to just convert newlines to spaces at some point (either in VisualEditor or in Parsoid). Perhaps VisualEditor would be a better place to implement this from the user's perspective, but Parsoid doing what it does now would still be weird :) – maybe we should just fix this in both places?
This will need a Parsoid fix since other Parsoid users might still give it HTML that won't be preserved in the html -> wt -> html transformation. VE can choose to fix or not independently.