Last modified: 2014-10-20 01:21:48 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T7718, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 5718 - Block element written inline splits multiline paragraphs


Summary:	Block element written inline splits multiline paragraphs

Status:	REOPENED

Product:	MediaWiki
Classification:	Unclassified
Component:	Parser (Other open bugs)
Version:	1.16.x
Hardware:	All All

Importance:	Low normal with 4 votes (vote)
Target Milestone:	---
Assigned To:	C. Scott Ananian

URL:	http://test.wikipedia.org/wiki/Bug_5718
Whiteboard:
Keywords:	newparser, patch, patch-reviewed

Duplicates:	8037 14489 (view as bug list)
Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2006-04-26 01:36 UTC by omniplex
Modified:	2014-10-20 01:21 UTC (History)
CC List:	13 users (show)

See Also:	6200
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Split line which contains end of paragraph and start of a block element (1.11 KB, patch) 2007-06-05 03:58 UTC, George Dowding	Details
Small improvement of the previous patch (2.39 KB, patch) 2008-03-20 23:58 UTC, Mormegil	Details
Refresh of previous patch (1.87 KB, patch) 2009-12-29 02:17 UTC, George Dowding	Details
Adds testcases mentioned in this bug to parserTests.txt (844 bytes, text/plain) 2009-12-29 02:23 UTC, George Dowding	Details
Output from test failure. (978 bytes, text/plain) 2009-12-29 02:25 UTC, George Dowding	Details
fix bug and keep old behavior for paragraphs inside div (2.31 KB, patch) 2009-12-30 04:19 UTC, George Dowding	Details
Show Obsolete (3) Add an attachment (proposed patch, testcase, etc.)

Description omniplex 2006-04-26 01:36:50 UTC

<pre>-formatted text at the end of a multiline (source)
paragraph splits the leading text of the last line
before the <pre> into a separate paragraph.

In other words the last line of the source paragraph...
111 foo 111
222 bar 222
333 xyz <pre>abc</pre>
...is rendered as 222</p><p>333 xyz</p><pre>abc</pre>...
instead of 222 333 xyz </p><pre>abc</pre>...

For a demo see [[w:User:Omniplex/5569b]] - this was a 
part of the <pre>-observations in [[mediazilla:5569]],
but it's most probably unrelated to this bug.

Comment 1 lɛʁi לערי ריינהארט 2006-04-26 12:49:09 UTC

(In reply to comment #0)
> ... For a demo see [[w:User:Omniplex/5569b]] ...

should read [[User:Omniplex/5569b]]

see also [[rmy:MediaWiki:Monobook.css]]

Comment 2 Antoine "hashar" Musso (WMF) 2006-04-28 18:28:37 UTC

When parsing '333 xyz <pre>abc</pre>' the parser is in a paragraph
and close it because the line contain a <pre>.

The bug in Parser.php:

if ($this->mLastSection != 'pre') {
   $paragraphStack = false;
   $output .= $this->closeParagraph().'<pre>';
   $this->mLastSection = 'pre';
}

So we have to look at mLastSection extract text in it wich is before '<pre>',
output that text, closeParagraph.

I will try to patch it :)

Comment 3 Antoine "hashar" Musso (WMF) 2006-04-28 18:47:38 UTC

above code is not really where the bug occurs :p

Comment 4 Antoine "hashar" Musso (WMF) 2006-04-28 18:52:42 UTC

The bug is appearing whenever we have a tag in a line.
$openmatch / $closematch show the full list of tag.

Another example:

foo bar toto
azoueaz azoeaz
super foo<h2>tooto title</h2>

Will insert a break line (closeParagraph call before the <h2>):

foo bar toto azoueaz azoeaz
super foo
TOOTO TITLE

Comment 5 George Dowding 2007-05-28 12:45:13 UTC

I am interested in looking at this.

Comment 6 George Dowding 2007-06-05 03:58:30 UTC

Created attachment 3719 [details]
Split line which contains end of paragraph and start of a block element

This seems to work for the simple test case where there is the ending part of a paragraph followed by a single starting block tag.  I also need to verify that no regressions have been introduced.

Comment 7 Mormegil 2008-03-20 23:29:30 UTC

Changing summary so that it better suits the more general problem, updating version (this is still a problem in current SVN), changing URL to an existing testcase.

Comment 8 Mormegil 2008-03-20 23:29:55 UTC

*** Bug 8037 has been marked as a duplicate of this bug. ***

Comment 9 Mormegil 2008-03-20 23:58:54 UTC

Created attachment 4739 [details]
Small improvement of the previous patch

The previous version of the patch searched for the first < on the line, which might not be the block level element we are breaking at.

Why not just capture the offset from the previous preg_match that found the element?

Comment 10 Mormegil 2008-07-17 08:36:11 UTC

*** Bug 14489 has been marked as a duplicate of this bug. ***

Comment 11 Bryan Baron 2009-12-07 14:35:04 UTC

Not fixed yet. Nothing since 2008-07-17.

111 foo 111
222 bar 222
333 xyz <pre>abc</pre>

Still renders as:

<p><br /> 
111 foo 111 222 bar 222</p> 
<p>333 xyz</p> 
<pre> 
abc
</pre> 

instead of:

<p>111 foo 111 222 bar 222 333 xyz</p>
<pre>
abc
</pre>

Comment 12 George Dowding 2009-12-09 00:54:48 UTC

I will take another look.

Comment 13 George Dowding 2009-12-29 02:17:46 UTC

Created attachment 6905 [details]
Refresh of previous patch

Comment 14 George Dowding 2009-12-29 02:21:11 UTC

The last patch was stale, but the basic idea was the right way to fix this bug.
 I cleaned it up a little and added it in manually.

The fix as it is, breaks 3 previously passing test cases. One of them is
whitespace only. The other two are structural, but the html still looks ok.

I will update the first test case if there are no objections.

The latter two involve paragraphs inside a <div>, where there is a linebreak
after the <div>.  I don't totally understand the test cases.  Based on the
other div tests, there will be no <p> created if the text immediately follows
the <div> tag, but if there is a linebreak there will be a <p> created, *but*
only for the first paragraph.  It seems like all should be broken into
paragraphs or none should be broken into paragraphs.


Added current patch and output from relevant test cases if someone could take a
look.

Thanks!

Comment 15 George Dowding 2009-12-29 02:23:09 UTC

Created attachment 6906 [details]
Adds testcases mentioned in this bug to parserTests.txt

Comment 16 George Dowding 2009-12-29 02:25:24 UTC

Created attachment 6907 [details]
Output from test failure.

Comment 17 George Dowding 2009-12-30 04:19:13 UTC

Created attachment 6911 [details]
fix bug and keep old behavior for paragraphs inside div

Added logic to handle div as a special case to keep the old behavior.  I don't think that is correct, but the new parse was not correct either.  Div processing has some subtle issues that need to be changed/fixed in outside of this bug.

Comment 18 George Dowding 2009-12-30 04:29:33 UTC

Changing to fixed.  If someone with commit access can apply this patch, that would be great.

Comment 19 Bryan Baron 2009-12-30 05:03:42 UTC

Isn't fixed until reviewed and committed!

Comment 20 George Dowding 2009-12-30 16:09:23 UTC

That makes sense.  In the future, how do I signal that a patch is ready for review?

Comment 21 Platonides 2009-12-30 16:13:40 UTC

The need-review keyword does it. Bugging people also helps ;)

Comment 22 Mark A. Hershberger 2011-04-12 16:06:45 UTC

Punting this to the new parser Brion has under development.

Comment 23 Gabriel Wicke 2011-11-10 15:30:59 UTC

George, thank you for your patch! It seems to mostly do what it says, but I am quite worried about block-level elements other than div. Those would also need to be treated specially to avoid introducing regressions.

In general, we are currently trying to solve the problems inherent in the current parser architecture more systematically by writing a completely new parser (https://www.mediawiki.org/wiki/Future ). The problem you solved here should be fixed in this new parser, as we are no longer using regexp-based heuristics for block-level formatting.

Comment 24 Sumana Harihareswara 2012-01-23 19:45:22 UTC

(In reply to comment #23)
Marking this reviewed per Gabriel's comments.

Comment 25 S. McCandlish 2014-10-20 01:21:48 UTC

I've updated http://test.wikipedia.org/wiki/Bug_5718 with additional test cases.  They show extra weirdness.

The three cases:

<div>foo</div>

and

<div>foo
</div>

and

<div>
foo</div>

are all sent to the browser as:

<div>foo</div>

Only:

<div>
bar
</div>

is rendered as:

<div>
<p>bar</p>
</div>

It gets even weirder when the content is multiple lines.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links