Last modified: 2014-09-30 21:54:34 UTC
These seem to be the leading cause of edit corruption right now. myEventWatcherDiv: https://ru.wikipedia.org/?diff=64516612 https://ru.wikipedia.org/?diff=64516412 https://pt.wikipedia.org/?diff=39659121 https://pt.wikipedia.org/?diff=39659108 <embed> tags: https://pt.wikipedia.org/?diff=39696565 <object> tags: https://fr.wikipedia.org/?diff=105796883 https://fr.wikipedia.org/?diff=105796959 https://fr.wikipedia.org/?diff=105797061 I'm thinking we should put in hacks to remove these kinds of tags. Maybe at the point where we serialize the HTML and send it to Parsoid (ve.init.mw.Target#getHTML). If these tags are added immediately upon document creation (we'd need to get our hands on one of these bad plugins to test that) we could also consider trying to work around this in ve.createDocumentFromHtml instead. I suspect, though, that these tags are probably added asynchronously, and probably only in cases where we fall back to the iframe trick because DOMParser HTML support is not available.
Here's a novel one: https://sv.wikipedia.org/?diff=27732757
I wonder if we can just do something like this $( newDoc ) .remove( '[id=myEventWatcherDiv]' ) // Bug 51423 .remove( 'object[type=cosymantecnisbfw], script[id=NortonInternetSecurityBF]' ) // Bug 63229 .remove( 'embed[id ^= xunlei_com_thunder_helper_plugin]' ) // Bug 63121 .remove( 'div[id=sendToInstapaperResults]' ) // Bug 61776 .remove( 'style[id=_clearly_component__css]' ) // Bug 53252 .remove( 'script[id=FoxLingoJs]' ) // Bug 52884 .remove( 'embed[type=application\\/x-datavault]' ) // Bug 52791 .remove( 'embed[type=application\\/iodbc]' ); // Bug 51521
(In reply to Alex Monk from comment #2) > I wonder if we can just do something like this > > $( newDoc ) > .remove( '[id=myEventWatcherDiv]' ) // Bug 51423 > .remove( 'object[type=cosymantecnisbfw], > script[id=NortonInternetSecurityBF]' ) // Bug 63229 > .remove( 'embed[id ^= xunlei_com_thunder_helper_plugin]' ) // Bug 63121 > .remove( 'div[id=sendToInstapaperResults]' ) // Bug 61776 > .remove( 'style[id=_clearly_component__css]' ) // Bug 53252 > .remove( 'script[id=FoxLingoJs]' ) // Bug 52884 > .remove( 'embed[type=application\\/x-datavault]' ) // Bug 52791 > .remove( 'embed[type=application\\/iodbc]' ); // Bug 51521 Yeah I was thinking about doing something like that. We have no easy way to know in advance what that will fix, but we can try it.
Change 163961 had a related patch set uploaded by Alex Monk: Remove certain blacklisted elements when getting HTML from document https://gerrit.wikimedia.org/r/163961
Change 163961 merged by jenkins-bot: Remove certain blacklisted elements when getting HTML from document https://gerrit.wikimedia.org/r/163961
Marking this as fixed, will keep an eye on the others in the next couple of weeks or so to see if those are resolved by it.