Last modified: 2014-10-20 01:21:48 UTC
<pre>-formatted text at the end of a multiline (source) paragraph splits the leading text of the last line before the <pre> into a separate paragraph. In other words the last line of the source paragraph... 111 foo 111 222 bar 222 333 xyz <pre>abc</pre> ...is rendered as 222</p><p>333 xyz</p><pre>abc</pre>... instead of 222 333 xyz </p><pre>abc</pre>... For a demo see [[w:User:Omniplex/5569b]] - this was a part of the <pre>-observations in [[mediazilla:5569]], but it's most probably unrelated to this bug.
(In reply to comment #0) > ... For a demo see [[w:User:Omniplex/5569b]] ... should read [[User:Omniplex/5569b]] see also [[rmy:MediaWiki:Monobook.css]]
When parsing '333 xyz <pre>abc</pre>' the parser is in a paragraph and close it because the line contain a <pre>. The bug in Parser.php: if ($this->mLastSection != 'pre') { $paragraphStack = false; $output .= $this->closeParagraph().'<pre>'; $this->mLastSection = 'pre'; } So we have to look at mLastSection extract text in it wich is before '<pre>', output that text, closeParagraph. I will try to patch it :)
above code is not really where the bug occurs :p
The bug is appearing whenever we have a tag in a line. $openmatch / $closematch show the full list of tag. Another example: foo bar toto azoueaz azoeaz super foo<h2>tooto title</h2> Will insert a break line (closeParagraph call before the <h2>): foo bar toto azoueaz azoeaz super foo TOOTO TITLE
I am interested in looking at this.
Created attachment 3719 [details] Split line which contains end of paragraph and start of a block element This seems to work for the simple test case where there is the ending part of a paragraph followed by a single starting block tag. I also need to verify that no regressions have been introduced.
Changing summary so that it better suits the more general problem, updating version (this is still a problem in current SVN), changing URL to an existing testcase.
*** Bug 8037 has been marked as a duplicate of this bug. ***
Created attachment 4739 [details] Small improvement of the previous patch The previous version of the patch searched for the first < on the line, which might not be the block level element we are breaking at. Why not just capture the offset from the previous preg_match that found the element?
*** Bug 14489 has been marked as a duplicate of this bug. ***
Not fixed yet. Nothing since 2008-07-17. 111 foo 111 222 bar 222 333 xyz <pre>abc</pre> Still renders as: <p><br /> 111 foo 111 222 bar 222</p> <p>333 xyz</p> <pre> abc </pre> instead of: <p>111 foo 111 222 bar 222 333 xyz</p> <pre> abc </pre>
I will take another look.
Created attachment 6905 [details] Refresh of previous patch
The last patch was stale, but the basic idea was the right way to fix this bug. I cleaned it up a little and added it in manually. The fix as it is, breaks 3 previously passing test cases. One of them is whitespace only. The other two are structural, but the html still looks ok. I will update the first test case if there are no objections. The latter two involve paragraphs inside a <div>, where there is a linebreak after the <div>. I don't totally understand the test cases. Based on the other div tests, there will be no <p> created if the text immediately follows the <div> tag, but if there is a linebreak there will be a <p> created, *but* only for the first paragraph. It seems like all should be broken into paragraphs or none should be broken into paragraphs. Added current patch and output from relevant test cases if someone could take a look. Thanks!
Created attachment 6906 [details] Adds testcases mentioned in this bug to parserTests.txt
Created attachment 6907 [details] Output from test failure.
Created attachment 6911 [details] fix bug and keep old behavior for paragraphs inside div Added logic to handle div as a special case to keep the old behavior. I don't think that is correct, but the new parse was not correct either. Div processing has some subtle issues that need to be changed/fixed in outside of this bug.
Changing to fixed. If someone with commit access can apply this patch, that would be great.
Isn't fixed until reviewed and committed!
That makes sense. In the future, how do I signal that a patch is ready for review?
The need-review keyword does it. Bugging people also helps ;)
Punting this to the new parser Brion has under development.
George, thank you for your patch! It seems to mostly do what it says, but I am quite worried about block-level elements other than div. Those would also need to be treated specially to avoid introducing regressions. In general, we are currently trying to solve the problems inherent in the current parser architecture more systematically by writing a completely new parser (https://www.mediawiki.org/wiki/Future ). The problem you solved here should be fixed in this new parser, as we are no longer using regexp-based heuristics for block-level formatting.
(In reply to comment #23) Marking this reviewed per Gabriel's comments.
I've updated http://test.wikipedia.org/wiki/Bug_5718 with additional test cases. They show extra weirdness. The three cases: <div>foo</div> and <div>foo </div> and <div> foo</div> are all sent to the browser as: <div>foo</div> Only: <div> bar </div> is rendered as: <div> <p>bar</p> </div> It gets even weirder when the content is multiple lines.