Last modified: 2012-11-05 20:14:14 UTC
For compatibility with the PHP parser, I would like Parsoid to mimic this completely insane behavior: all whitespace preceding [[Category:Foo]] is eaten. So for instance, "Foo [[Category:Bar]]Baz" renders as "<p>FooBaz</p>", "Foo\n\n\n\n[[Category:Bar]]Baz" also renders as "<p>FooBaz</p>", and "Foo\n\n\n\n[[Category:Bar]]\nBaz" renders as "<p>Foo\nBaz</p>". Meanwhile, whitespace *after* categories is processed normally, so "Foo[[Category:Bar]]\n\nBaz" renders as "<p>Foo</p><p>Baz</p>". I realize this is totally insane behavior, but I'm having problems in VE because Parsoid isn't currently doing this: 1) these cases render differently in the editor than they do in the actual article 2) lists of categories at the end of the page end up as long strings of newline characters in the editor I could work around this in the editor to some degree, but it's tricky because only categories exhibit this behavior, magic words don't. "Fixing" it in Parsoid would be nicer. Of course, the whitespace that's being stripped would still have to be put in round-trip data and be restored by the serializer.
That whitespace will still be round-tripped in a mw:Placeholder object. Otherwise it is just plain round-tripping.
https://gerrit.wikimedia.org/r/31591 removed the p-wrapping around blocks of category links, but preserves the whitespace. Mixed content with categories is still wrapped in paragraphs. We don't currently plan to implement the weirder part of the whitespace-eating behavior as it appears to be a side-effect of paragraph / pre avoidance rather than a use case of its own. This kind of content should be rare enough to not matter. More fixes are coming for the avoidance of preformatting of indentend category links. These will be tracked in a separate bug, so closing this one as fixed.