Last modified: 2013-12-05 23:30:22 UTC
migrateTrailingNLs DOM pass was needed at some point to deal with some bugs / inadequacies in our later dom passes (encap/dsr/..). I think those conditions are no longer present now. Worth experimenting with getting rid of that pass altogether and testing.
I verified that this is necessary to make sure DSR boundaries for nodes does not swallow trailing newlines in certain cases where the parser transformations have not done a clean job of closing tags (or when closing tags might be missing). This especially affects selser output with a bunch of bug fixes and selser improvements and simplifications coming up in a new patch.
[15:25] <gwicke> I have a vague idea of what you might mean in https://bugzilla.wikimedia.org/show_bug.cgi?id=55566#c1, but if you have a more concrete example then that might be helpful [15:26] <subbu> gwicke, simple way to do it is turn it off in mediawiki.DOMPostProcessor.js and look at parser test results [15:27] <subbu> basically .. <li>foo\n<li> vs. <li>foo</li>\n ... we simplified selser based on assuming the latter. [15:27] <subbu> similar for other nodes as well
commit 8fa522eb2652dac8b091240037ff076db1088eec Author: Subramanya Sastry <ssastry@wikimedia.org> Date: Tue Oct 29 13:28:12 2013 -0500 Bunch of improvements and tweaks to selser/wt2wt sep handling .... <snip> .... WikitextSerializer (selser) --------------------------- * Stripping leading/trailing separators on original source of a selsered node is unnecessary (and actually buggy). Our DSR computation is tight, that is, DSR ranges for a DOM node does not surround leading or trailing newlines. So, the original source will not have any leading or trailing newlines from separators outside the node itself. Newline separators found in the DOM subtree apply to parent--firstchild and lastchild--parent pairs, and they ought to continue to be present when selser outputs the original source. Since this is being done on a node with valid DSR that has not been modified, not even the wrapper node could have been modified -- the entire subtree is completely unedited. So, HTML --> wikitext edits of wrappers will not go through this code route either. This simplifies the selser code path some more and also gets a whole bunch of new selser tests passing. .... <snip> ....