Last modified: 2013-06-28 02:42:26 UTC
On VE load, do an "originalHtml == serialise(linmod(originalHtml)" or whatever and fail out of the VE if false; ideally, also allow the user to submit a failure report for debugging purposes (this will need to get checked through with legal).
A common issue currently is changes in attribute order and a change of quoting style for JSON attributes. It would be great if VE could preserve the order of attributes. For the quoting, we have some smart quoting code that you could reuse. Alternatively, you can compare the innerHTML of the original DOM with the exported DOM as that implicitly changes JSON attribute quoting to the verbose style.
We wouldn't be doing string-based comparisons, we'd be doing DOM-based comparisons. That's what Parsoid should be doing too, IMO. If comparisons are DOM-based, quoting style and attribute order don't matter. In general, I don't believe there's a way to read or write attribute order in the DOM.
We are diffing on the DOM on the way in, but while debugging a text diff is useful too. Order preservation is not guaranteed in the DOM4 spec afaik, but is implemented and supported nevertheless using the attributes collection and numerical indexing: https://developer.mozilla.org/en-US/docs/DOM/Node.attributes
Roan's offered to do this one on his flight. Marking as such.
Pulling.
Any news on this? Having some automated general sanity checks on the returned wikitext (for example size, check if it starts with a doctype) would also be good, especially now that users don't see a diff by default any more. If any of the automated sanity checks fail, you should probably ask the user to review changes before saving. This should catch most major article corruption issues before they happen.
Change 70106 had a related patch set uploaded by Krinkle: mw.ViewPageTarget: Add sanity check for DOM roundtrip https://gerrit.wikimedia.org/r/70106
Change 70106 merged by jenkins-bot: mw.ViewPageTarget: Add sanity check for DOM roundtrip https://gerrit.wikimedia.org/r/70106
Written and about to be deployed.