Last modified: 2014-10-23 17:20:49 UTC
See output below on "<small>\n*a\n</small>". P-wrapper wraps both start and end tags with <p> and </p> which than forces the tree builder to fix it all up. The right fix would be for the p-wrapper to handle this better. [subbu@earth tests] echo "<small>\n*a\n</small>" | node parse --trace html [HTML] | {"type":"TagTk","name":"body","attribs":[],"dataAttribs":{}} 0-[HTML] | {"type":"TagTk","name":"body","attribs":[],"dataAttribs":{}} 0-[HTML] | {"type":"TagTk","name":"p","attribs":[],"dataAttribs":{"tagId":1}} 0-[HTML] | {"type":"TagTk","name":"small","attribs":[],"dataAttribs":{"tsr":[0,7],"stx":"html","tagId":2}} 0-[HTML] | {"type":"EndTagTk","name":"p","attribs":[],"dataAttribs":{}} 0-[HTML] | {"type":"NlTk","dataAttribs":{"tsr":[7,8]}} 0-[HTML] | {"type":"TagTk","name":"ul","attribs":[],"dataAttribs":{"tsr":[8,8],"tagId":3}} 0-[HTML] | {"type":"TagTk","name":"li","attribs":[],"dataAttribs":{"tsr":[8,9],"tagId":4}} 0-[HTML] | "a" 0-[HTML] | {"type":"EndTagTk","name":"li","attribs":[],"dataAttribs":{}} 0-[HTML] | {"type":"EndTagTk","name":"ul","attribs":[],"dataAttribs":{}} 0-[HTML] | {"type":"NlTk","dataAttribs":{"tsr":[10,11]}} 0-[HTML] | {"type":"TagTk","name":"p","attribs":[],"dataAttribs":{"tagId":5}} 0-[HTML] | {"type":"EndTagTk","name":"small","attribs":[],"dataAttribs":{"tsr":[11,19],"stx":"html"}} 0-[HTML] | {"type":"EndTagTk","name":"p","attribs":[],"dataAttribs":{}} 0-[HTML] | {"type":"NlTk","dataAttribs":{"tsr":[19,20]}} 0-[HTML] | {"type":"EOFTk"} <!DOCTYPE html> <html prefix="dc: http://purl.org/dc/terms/ mw: http://mediawiki.org/rdf/"><head prefix="mwr: http://en.wikipedia.org/wiki/Special:Redirect/"><meta property="mw:articleNamespace" content="0"/><meta property="mw:parsoidVersion" content="0"/><link rel="dc:isVersionOf" href="//en.wikipedia.org/wiki/Main_Page"/><title></title><base href="//en.wikipedia.org/wiki/Main_Page"/><link rel="stylesheet" href="//en.wikipedia.org/w/load.php?modules=ext.geshi.language.html4strict|ext.geshi.local|mediawiki.legacy.commonPrint,shared|mediawiki.skinning.elements|mediawiki.skinning.content|mediawiki.skinning.interface|skins.vector.styles|site|mediawiki.skinning.content.parsoid&only=styles&debug=true&skin=vector"/></head><body data-parsoid='{"dsr":[0,20,0,0]}' lang="en" class="mw-content-ltr mw-body-content" dir="ltr"><p data-parsoid='{"dsr":[0,7,0,0]}'><small data-parsoid='{"stx":"html","autoInsertedEnd":true,"dsr":[0,7,7,0]}'></small></p><small data-parsoid='{"stx":"html","autoInsertedEnd":true,"autoInsertedStart":true,"dsr":[7,10,0,0]}'> <ul data-parsoid='{"dsr":[8,10,0,0]}'><li data-parsoid='{"dsr":[8,10,1,0]}'>a</li></ul></small> <p data-parsoid='{"dsr":[11,19,0,0]}'><small data-parsoid='{"stx":"html","autoInsertedStart":true,"dsr":[11,19,0,8]}'></small></p> </body></html> 0-[HTML] | {"type":"TagTk","name":"body","attribs":[],"dataAttribs":{}}
So, looks like PHP parser + Tidy moves these formatting tags into the list items. Try the following in a sandbox. <small> #1 #2 #3 #4 </small> and notice the big numbers for the list items and the small numbers for content which looks odd (and not quite what is intended).
This is probably simpler to fix once bug 64901 is handled.
Change 155735 had a related patch set uploaded by Cscott: Sync parserTests with core. https://gerrit.wikimedia.org/r/155735
Change 155735 merged by jenkins-bot: Sync parserTests with core. https://gerrit.wikimedia.org/r/155735
Change 162816 had a related patch set uploaded by Subramanya Sastry: WIP Bug 68395: Tweaks to p-wrapping around formatting tags https://gerrit.wikimedia.org/r/162816
Change 162816 merged by jenkins-bot: Bug 68395: Tweaks to p-wrapping around formatting tags https://gerrit.wikimedia.org/r/162816