Last modified: 2014-10-20 01:21:48 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 5718 - Block element written inline splits multiline paragraphs
Block element written inline splits multiline paragraphs
Status: REOPENED
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
1.16.x
All All
: Low normal with 4 votes (vote)
: ---
Assigned To: C. Scott Ananian
http://test.wikipedia.org/wiki/Bug_5718
: newparser, patch, patch-reviewed
: 8037 14489 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-04-26 01:36 UTC by omniplex
Modified: 2014-10-20 01:21 UTC (History)
13 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Split line which contains end of paragraph and start of a block element (1.11 KB, patch)
2007-06-05 03:58 UTC, George Dowding
Details
Small improvement of the previous patch (2.39 KB, patch)
2008-03-20 23:58 UTC, Mormegil
Details
Refresh of previous patch (1.87 KB, patch)
2009-12-29 02:17 UTC, George Dowding
Details
Adds testcases mentioned in this bug to parserTests.txt (844 bytes, text/plain)
2009-12-29 02:23 UTC, George Dowding
Details
Output from test failure. (978 bytes, text/plain)
2009-12-29 02:25 UTC, George Dowding
Details
fix bug and keep old behavior for paragraphs inside div (2.31 KB, patch)
2009-12-30 04:19 UTC, George Dowding
Details

Description omniplex 2006-04-26 01:36:50 UTC
<pre>-formatted text at the end of a multiline (source)
paragraph splits the leading text of the last line
before the <pre> into a separate paragraph.

In other words the last line of the source paragraph...
111 foo 111
222 bar 222
333 xyz <pre>abc</pre>
...is rendered as 222</p><p>333 xyz</p><pre>abc</pre>...
instead of 222 333 xyz </p><pre>abc</pre>...

For a demo see [[w:User:Omniplex/5569b]] - this was a 
part of the <pre>-observations in [[mediazilla:5569]],
but it's most probably unrelated to this bug.
Comment 1 lɛʁi לערי ריינהארט 2006-04-26 12:49:09 UTC
(In reply to comment #0)
> ... For a demo see [[w:User:Omniplex/5569b]] ...

should read [[User:Omniplex/5569b]]

see also [[rmy:MediaWiki:Monobook.css]]
Comment 2 Antoine "hashar" Musso (WMF) 2006-04-28 18:28:37 UTC
When parsing '333 xyz <pre>abc</pre>' the parser is in a paragraph
and close it because the line contain a <pre>.

The bug in Parser.php:

if ($this->mLastSection != 'pre') {
   $paragraphStack = false;
   $output .= $this->closeParagraph().'<pre>';
   $this->mLastSection = 'pre';
}

So we have to look at mLastSection extract text in it wich is before '<pre>',
output that text, closeParagraph.

I will try to patch it :)

Comment 3 Antoine "hashar" Musso (WMF) 2006-04-28 18:47:38 UTC
above code is not really where the bug occurs :p
Comment 4 Antoine "hashar" Musso (WMF) 2006-04-28 18:52:42 UTC
The bug is appearing whenever we have a tag in a line.
$openmatch / $closematch show the full list of tag.

Another example:

foo bar toto
azoueaz azoeaz
super foo<h2>tooto title</h2>

Will insert a break line (closeParagraph call before the <h2>):

foo bar toto azoueaz azoeaz
super foo
TOOTO TITLE
Comment 5 George Dowding 2007-05-28 12:45:13 UTC
I am interested in looking at this.
Comment 6 George Dowding 2007-06-05 03:58:30 UTC
Created attachment 3719 [details]
Split line which contains end of paragraph and start of a block element

This seems to work for the simple test case where there is the ending part of a paragraph followed by a single starting block tag.  I also need to verify that no regressions have been introduced.
Comment 7 Mormegil 2008-03-20 23:29:30 UTC
Changing summary so that it better suits the more general problem, updating version (this is still a problem in current SVN), changing URL to an existing testcase.
Comment 8 Mormegil 2008-03-20 23:29:55 UTC
*** Bug 8037 has been marked as a duplicate of this bug. ***
Comment 9 Mormegil 2008-03-20 23:58:54 UTC
Created attachment 4739 [details]
Small improvement of the previous patch

The previous version of the patch searched for the first < on the line, which might not be the block level element we are breaking at.

Why not just capture the offset from the previous preg_match that found the element?
Comment 10 Mormegil 2008-07-17 08:36:11 UTC
*** Bug 14489 has been marked as a duplicate of this bug. ***
Comment 11 Bryan Baron 2009-12-07 14:35:04 UTC
Not fixed yet. Nothing since 2008-07-17.

111 foo 111
222 bar 222
333 xyz <pre>abc</pre>

Still renders as:

<p><br /> 
111 foo 111 222 bar 222</p> 
<p>333 xyz</p> 
<pre> 
abc
</pre> 

instead of:

<p>111 foo 111 222 bar 222 333 xyz</p>
<pre>
abc
</pre>
Comment 12 George Dowding 2009-12-09 00:54:48 UTC
I will take another look.
Comment 13 George Dowding 2009-12-29 02:17:46 UTC
Created attachment 6905 [details]
Refresh of previous patch
Comment 14 George Dowding 2009-12-29 02:21:11 UTC
The last patch was stale, but the basic idea was the right way to fix this bug.
 I cleaned it up a little and added it in manually.

The fix as it is, breaks 3 previously passing test cases. One of them is
whitespace only. The other two are structural, but the html still looks ok.

I will update the first test case if there are no objections.

The latter two involve paragraphs inside a <div>, where there is a linebreak
after the <div>.  I don't totally understand the test cases.  Based on the
other div tests, there will be no <p> created if the text immediately follows
the <div> tag, but if there is a linebreak there will be a <p> created, *but*
only for the first paragraph.  It seems like all should be broken into
paragraphs or none should be broken into paragraphs.


Added current patch and output from relevant test cases if someone could take a
look.

Thanks!
Comment 15 George Dowding 2009-12-29 02:23:09 UTC
Created attachment 6906 [details]
Adds testcases mentioned in this bug to parserTests.txt
Comment 16 George Dowding 2009-12-29 02:25:24 UTC
Created attachment 6907 [details]
Output from test failure.
Comment 17 George Dowding 2009-12-30 04:19:13 UTC
Created attachment 6911 [details]
fix bug and keep old behavior for paragraphs inside div

Added logic to handle div as a special case to keep the old behavior.  I don't think that is correct, but the new parse was not correct either.  Div processing has some subtle issues that need to be changed/fixed in outside of this bug.
Comment 18 George Dowding 2009-12-30 04:29:33 UTC
Changing to fixed.  If someone with commit access can apply this patch, that would be great.
Comment 19 Bryan Baron 2009-12-30 05:03:42 UTC
Isn't fixed until reviewed and committed!
Comment 20 George Dowding 2009-12-30 16:09:23 UTC
That makes sense.  In the future, how do I signal that a patch is ready for review?
Comment 21 Platonides 2009-12-30 16:13:40 UTC
The need-review keyword does it. Bugging people also helps ;)
Comment 22 Mark A. Hershberger 2011-04-12 16:06:45 UTC
Punting this to the new parser Brion has under development.
Comment 23 Gabriel Wicke 2011-11-10 15:30:59 UTC
George, thank you for your patch! It seems to mostly do what it says, but I am quite worried about block-level elements other than div. Those would also need to be treated specially to avoid introducing regressions.

In general, we are currently trying to solve the problems inherent in the current parser architecture more systematically by writing a completely new parser (https://www.mediawiki.org/wiki/Future ). The problem you solved here should be fixed in this new parser, as we are no longer using regexp-based heuristics for block-level formatting.
Comment 24 Sumana Harihareswara 2012-01-23 19:45:22 UTC
(In reply to comment #23)
Marking this reviewed per Gabriel's comments.
Comment 25 S. McCandlish 2014-10-20 01:21:48 UTC
I've updated http://test.wikipedia.org/wiki/Bug_5718 with additional test cases.  They show extra weirdness.

The three cases:

<div>foo</div>

and

<div>foo
</div>

and

<div>
foo</div>

are all sent to the browser as:

<div>foo</div>

Only:

<div>
bar
</div>

is rendered as:

<div>
<p>bar</p>
</div>

It gets even weirder when the content is multiple lines.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links