Last modified: 2014-11-17 09:21:16 UTC
See the wikitext of the URL. I asked about it on [[WP:VPT]], and Rob implied this had been discussed before, but I couldn't find anything either here or there, and he didn't clarify when asked, so I'm filing this as a bug.
Definitely a bug. Blockquotes shouldn't interfere with wikitext parsing, just as divs don't.
Clarification of incorrect behavior: linebreaks are completely ignored when placed directly in any of <blockquote>, <table>, <ul>, <ol>, <li>, <p>, <tr>, <h1/2/3/4/5/6>. They're treated correctly when placed in <div>, <td>, <th>, or any inline element. The problem appears to be somewhere in Parser.php lines 2076 to 2096; commenting these (and respective closing braces, etc.) out fixes the problem in my test case, although God only knows what else it does.
Created attachment 2227 [details] One-line proposed patch againt r16063 This patch appears to fix the problem: it basically just removes the "not in a block" condition from the code that outputs paragraphs. The only question is if that could break something. I notice nothing odd with the patched version in a local copy of [[Paris]], which is a fairly typical example of a long enwiki article in terms of formatting.
*** Bug 7060 has been marked as a duplicate of this bug. ***
Line breaks *must* be ignored in <table>, <ul>, <ol>, and <tr> tags, because paragraphs or plain text aren't supposed to appear directly within them. <H>eadings should only contain inline content, so line breaks ought to be ignored in them too. <Blockquote> elements should definitely behave just like <div>s. <Li> tags can have block or inline content too, but changing their current behaviour might break existing pages -- some research would be required.
(In reply to comment #5) > Line breaks *must* be ignored in <table>, <ul>, <ol>, and <tr> tags, because paragraphs or plain text aren't supposed > to appear directly within them. <H>eadings should only contain inline content, so line breaks ought to be ignored in them > too. Ah, right. That's probably the reason for the current behavior. > <Blockquote> elements should definitely behave just like <div>s. <Li> tags can have block or inline content too, but > changing their current behaviour might break existing pages -- some research would be required. Why do you suspect that for <li> and not for <blockquote>?
The main difference is that list items are almost always entered using wikitext * or # bullets. Blockquotes can only be generated by typing HTML code. Come to think of it, line breaks can't really be entered in lists generated using # or *, since they signal the end of a list item. Real HTML <li>[...]</li> tags are very rarely used, and then only by HTML-savvy editors who want to apply a class or style attribute, or by editors who cut-and-paste some existing HTML code. Since block or inline content is allowed in list items, this would be the way to allow correct and more complex formatting of the contents of list items, but at the cost of complex and unusual markup in the edit field, incompatible with wikitext lists.
Created attachment 2260 [details] Alternative patch (against 16144), changes only <blockquote> It doesn't seem like <li>s are parsed this way for lists, but it seems they *are* parsed for the TOC, so <li> will have to remain on the list for the time being. As you say, it's not a terrible loss.
*** Bug 7262 has been marked as a duplicate of this bug. ***
Thank you for the patch. It works like a charm. I hope this is generally adopted into MediaWiki.
That's reviewed, then, I suppose.
(In reply to comment #11) > That's reviewed, then, I suppose. The fact it works doesn't mean it doesn't break other things. (I don't know if it does.)
Is this a duplicate of Bug 1857?
*** Bug 1857 has been marked as a duplicate of this bug. ***
When will the patch go live on the English Wikipedia?
Well, first of all it has to be applied, which it hasn't been (otherwise this would be RESOLVED FIXED with a revision number noted in the last comment). I did try applying it, but it gave me XHTML validity errors and on inspection it totally screwed up tag nesting. I'll probably get around to poking it again at some point in the indefinite future, if that helps.
็How about fixing by using function nl2br? or using ereg function to get to word from <blockquote> then use explode function to turn all word into array by \n and then echo it out with <p> and </p> per array index.
Is Bug 6419 related?
I'm quite sure it's not.
Is there a way to work around this in quotation templates? Putting a table or div or something inside the blockquote?
Yes, <blockquote><div> seems to do it. Good idea, now we can use semantic markup without the requirement of littering <p>'s everywhere manually.
(In reply to comment #21) > Yes, <blockquote><div> seems to do it. Hmmm.. Doesn't work for me. http://en.wikipedia.org/wiki/User:Omegatron/Sandbox What, specifically, did you find to work?
Aha. This works: <blockquote><div> paragraph 1 paragraph 2</div></blockquote> but this doesn't: <blockquote><div>paragraph 1 paragraph 2</div></blockquote>
It looks to be sensitive to newlines before/after </?div>, but this works: <blockquote><div> Foo bar </div></blockquote> This doesn't: <blockquote><div>Foo bar</div></blockquote>
Yeah. For the record, it apparently needs a newline in both places. This: <blockquote><div> paragraph 1 paragraph 2 paragraph 3</div></blockquote> results in: <blockquote> <div> <p>paragraph 1</p> <p>paragraph 2</p> paragraph 3</div> </blockquote> while this: <blockquote><div> paragraph 1 paragraph 2 paragraph 3 </div></blockquote> results in: <blockquote> <div> <p>paragraph 1</p> <p>paragraph 2</p> <p>paragraph 3</p> </div> </blockquote>
*** Bug 11282 has been marked as a duplicate of this bug. ***
Another bug in the default blockquote behaviour: an HTML ins or del inside a blockquote causes a paragraph break to be placed in front of it. The workaround is to nest a div inside the blockquote.
Added parser tests in r38624. I have no idea where the special handling is that makes blockquotes different from divs: even if I convert everything in Parser.php that contains the string "blockquote" or "div" to treat them identically, they're still treated differently. Maybe something in Sanitizer.php or something.
Technically, blockquote can only contain block elements (unlike div which can also contain inline elements and text), so the bug happens with invalid markup only. No one takes seriously the block-inline part of the (X)HTML spec, though. The effect of the newlines is a more general bug: bug 9207.
(In reply to comment #29) > Technically, blockquote can only contain block elements (unlike div which can > also contain inline elements and text), so the bug happens with invalid markup > only. Aha, I had missed that. So what's correct behavior in this case anyway, then? Wrapping inline elements and text in paragraphs, probably? > No one takes seriously the block-inline part of the (X)HTML spec, though. Well, maybe no one *should*, but we do claim to produce valid XHTML (barring bugs). So I guess we have to. > The effect of the newlines is a more general bug: bug 9207. That seems more like an opposite bug, doesn't it? Why do you think that bug is more general?
(In reply to comment #29) > Technically, blockquote can only contain block elements (unlike div which can > also contain inline elements and text), so the bug happens with invalid markup > only. When we say "newlines" we mean "paragraphs", since a newline in wikicode is converted to paragraph tags when rendered to HTML. Paragraph tags are certainly valid inside blockquotes.
See Bug #15491; the ins/del issue has been broken out into a separate bug, since it does not appear to be related to this one.
Another classic bug. :) Bumping for next bug day...
So is anyone actually working on these classic bugs?
Nope. Feel free to submit a patch.
Anyone interested in this bug should probably also take a look at Bug #15491.
Created attachment 7603 [details] Patch fixing this bug (6200) and also bug 15491 Hi! We (CustIS, custis.ru) have also discovered this problem, so I've written a simple patch for <blockquote> parser handling (paragraph end regexp in Parser::doBlockLevels). This patch also fixes bug 15491 (if I understood its point correctly: "<blockquote>a <ins>b</ins> c</blockquote>" does not insert a newline before b anymore), and it also fixes the behaviour of <center> (paragraphs are allowed within <center>).
When's the next "Bug Day"?
(In reply to comment #38) > When's the next "Bug Day"? Never. The idea never got going.
Comment on attachment 7603 [details] Patch fixing this bug (6200) and also bug 15491 Your patch causes XML well-formedness errors -- specifically, it causes the tests "Regression with preformatted in <center>" and "Bug 8293: Use of center tag ruins paragraph formatting" to fail. Specifically, the HTML output of """ <center> Blah </center> """ contains "<p></center>", and the output of """ <center> foo </center> bar baz """ contains """ <p>foo </center> """ So this patch isn't good, it causes regressions. To run parser tests on your installation so you can check future patches, run this command from the location where you installed MediaWiki: php maintenance/parserTests.php--quiet (I've added two new parser tests in r70167 to improve checks for <blockquote>.)
Created attachment 8126 [details] Completely rewritten doBlockLevels for MediaWiki 1.16 Hmm... You're right. But there are so many problems with doBlockLevels that I think the correct way to fix all them is to rewrite it, and I've even done it now! :) (for 1.16, but is there any difference to trunk?) Patch is attached. It even passes all parser tests with some modifications similar to "</p>\n<table>" --> "</p><table>", which, I suppose, isn't any real difference. The patch not only fixes newline issues with <blockquote> etc, but also replaces <p>...</p> to <div class="paragraph">...</div> when ... contains <div>'s, which, I think, could be very controversial, but I see no other solution that fixes the following problem: (example) """ A [[Image:B.jpg|right]] C A [[Image:B.jpg|right]] C """ is not split into two paragraphs as each of them contains <div class="floatright">. (other possible solution is to not use <div> in thumbnail code and replace it with some inline element with display:block style, but this is even uglier and breaks HTML semantics)
Created attachment 8221 [details] Completely rewritten doBlockLevels for MediaWiki 1.16 + minor fix
Created attachment 8235 [details] Completely rewritten doBlockLevels for MediaWiki 1.16 Oops, there is another little fix. :)
Punting this to the new parser Brion has under development.
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
Created attachment 8626 [details] New version of doBlockLevels (something fixed...) Updated doBlockLevels
+need-review to signal to developers that this patch needs reviewing
Vitalyi, thank you for your patch! As you probably know, the current parser is very limited in what it can do, so we are trying to develop a new parser that fixes these kinds of issues more systematically. We try to incorporate your ideas in the new parser, but don't currently have the resources to thoroughly test your patch with real content. Did you already test your patch against the parser tests?
Yes, I did it when I've developed it. All tests passed after some fixing. We use this parser on all our wikis - some of them are corporate (intranet) ones, some of them are public. Although parser tests pass, my implementation is still different from current MW parser, and it still can sometimes lead to an unwanted behaviour. For example, in my implementation <div>s (and, for example, <center> also) inside raw <html> blocks may interfere with <div>s outside it, because <html> may be wrapped into a "paragraph", implemented using a <div>. This is easily fixable by extracting all block elements (div, center) from <html>, so doBlockLevels can see them and wrap accordingly. Also, we had a funny bug with an unclosed <span> element in a pictogram template - some of the markup was mishandled and the image was duplicated on the page :) I didn't had time to test this deeply, just fixed it by adding a closing </span>. I can try to test this if it will be useful :)
Your patch does not appear to be made against SVN head. Please update it so that I can attempt to apply it.
Created attachment 9467 [details] Rewritten doBlockLevels for r103314 Yeah, it was made against 1.16. Updated it against r103314. The new patch doesn't include the part for cite extension, I'll post it separately. There are some changes in parser tests. Most of them are related to the space between closing and opening tags (</p>\n<p> --> </p><p>). Some are related to the subject of this bug. Plus there is one test that I've intentionally didn't change - <html><script>...</script></html> which will now wrap into <div class=paragraph> not <p>.
Created attachment 9468 [details] Rewritten doBlockLevels for r103314 Oops, fixed one more parser test :)
Created attachment 9469 [details] Patch for Cite parser tests Mostly related to space between closing and opening tags, but there's also one test with gallery in which real code was changed (with File:Foobar).
In the code, that's actually '<div class="paragraph">', with a quoted value, right?
Yes, but unquoted in the case of true wgRawHtml.
Created attachment 9526 [details] Updated patch (code-style + one fix), for r103914 New version of my patch, with correct code style, and one bug fixed - preformatting with a space at the line start didn't work right after a list element (if there was no blank line). + Discovered another difference to stock parser: a hack in the stock parser allows it to preformat lines in table cells. But because of this hack, for example, the following text: <table> <tr> <td> pre </td> </tr> </table> Parses to the non-expected: <table> <tr> <td> <pre>pre </pre> </td> </tr> </table> After applying my patch, preformatting inside table cells is now totally disabled... There are no tests about it - is it considered to be a feature? + Where can I read about the new parser which is now in development? How will it work? Will it continue to support these hacks? :-)
Failure to support explicitly marked-up <pre>...</pre> content as preformatted simply because it's inside a table cell would clearly be a bug. That exact layout situation is common in template documentation.
Hi Vitalyi, thank you for your update- I hope to find time to review your code and test it against real-world wiki source this week. The parser tests do not cover everything, so doing A-B testing on a wiki dump is still the most reliable test. > After applying my patch, preformatting inside table cells is now totally > disabled... There are no tests about it - is it considered to be a feature? Preformatting in table cells should be supported, and ignoring the indentation for markup-only lines appears to be a sensible choice to me. > + Where can I read about the new parser which is now in development? How will > it work? Will it continue to support these hacks? :-) I am currently working on a parser based on a PEG-based tokenizer for a standard HTML5 parser backend. The idea is to map both html tags and wiki syntax to the same tokens, and let the HTML parser construct the DOM tree from the resulting token soup. Some wiki-specific rules are enforced by post-processing the token stream before feeding those to the HTML parser. Documentation for this effort can be found at http://www.mediawiki.org/wiki/Future/Parser_plan and my user page (http://www.mediawiki.org/wiki/User:GWicke). The (currently Javascript) source is available in the VisualEditor extension in SVN trunk.
(In reply to comment #57) > Failure to support explicitly marked-up <pre>...</pre> content as preformatted > simply because it's inside a table cell would clearly be a bug. That exact > layout situation is common in template documentation. Could you point us to examples? I'd like to get this added to the parser tests if possible.
Vitalyi, I added the test case you described in r103933. If you came across more differences or issues not covered by parser tests (perhaps the one mentioned in comment 49?), then it would be nice to add parser tests for those as well. This benefits both the old and new parser(s).
> Could you point us to examples? I'd like to get this added to the parser tests > if possible. No, I was responding to Vitalyi here: >> After applying my patch, preformatting inside table cells is now totally >> disabled... There are no tests about it - is it considered to be a feature? It couldn't possibly be a feature, since pre inside td is entirely routine markup.
(In reply to comment #58) > > + Where can I read about the new parser which is now in development? How will > > it work? Will it continue to support these hacks? :-) > > I am currently working on a parser based on a PEG-based tokenizer for a > standard HTML5 parser backend. The idea is to map both html tags and wiki > syntax to the same tokens, and let the HTML parser construct the DOM tree from > the resulting token soup. Some wiki-specific rules are enforced by > post-processing the token stream before feeding those to the HTML parser. > > Documentation for this effort can be found at > http://www.mediawiki.org/wiki/Future/Parser_plan and my user page > (http://www.mediawiki.org/wiki/User:GWicke). The (currently Javascript) source > is available in the VisualEditor extension in SVN trunk. Vitaliy, I also suggest you subscribe to the parser/visual editor mailing list: http://lists.wikimedia.org/pipermail/wikitext-l/
Created attachment 9660 [details] New version of patch, for r105865 Updated the patch. There is one fix: it does not wrap <html> contents into paragraphs (<div class=paragraph>) anymore, like the original parser. But, it still has the issue with preformatting inside table cells.
Vitaliy: Thanks for your patch. After you fix the issue with preformatting inside table cells, might you be interested in using developer access to directly suggest it into our Git source control system? https://www.mediawiki.org/wiki/Developer_access https://www.mediawiki.org/wiki/Git/Workflow#How_to_submit_a_patch
Mark, do you have a moment to check on whether this patch is still potentially useful? Thanks.
Change 77961 had a related patch set uploaded by Cscott: Make linebreaks in <blockquote> behave like <div> (bug 6200). https://gerrit.wikimedia.org/r/77961
*** Bug 51086 has been marked as a duplicate of this bug. ***
Change 77961 merged by jenkins-bot: Make line breaks in <blockquote> behave like <div> (bug 6200). https://gerrit.wikimedia.org/r/77961
Note that indented text inside <blockquote> should *not* be treated like <div>; see bug 52763.
Change 78967 had a related patch set uploaded by Subramanya Sastry: Revert "Make line breaks in <blockquote> behave like <div> (bug 6200)." See bug 52763 https://gerrit.wikimedia.org/r/78967
Change 78967 merged by jenkins-bot: Revert "Make line breaks in <blockquote> behave like <div> (bug 6200)." See bug 52763 https://gerrit.wikimedia.org/r/78967
From bug 52763. --------- Change Iecab69460c6aac36acfe2d9440dc5d3590de8b57 for bug 6200 was somewhat prematurely +2'ed and merged. Subbu has pointed out that another difference between <blockquote> and <div> is that indented content inside <blockquote> is *not* considered to start a <pre>. This bug seeks to revert that part of the <blockquote> behavior change, since a quick grep through enwiki-20130708 finds that indented text inside <blockquote> is quite common. --------- So, this is trickier than what it seems to be. It is not enough to make <blockquote> behave like <div>. Indent-pre in <blockquote> should be turned off (especially because a lot of existing pages that use indents in block-quote will break without it). Or, a convincing case has to be made why supporting indent-pre in <blockquote> is okay to do and then fix existing pages that use indents in blockquote.
Change 78972 had a related patch set uploaded by Cscott: Make line breaks in <blockquote> behave like <div> (bug 6200). https://gerrit.wikimedia.org/r/78972
I'm merging bug 52763 and this one; also merging the previous patch for 6200 and the patch I wrote for 52763. Please look at the parser tests carefully to observe the behavior change. Primarily, we are now creating <p> elements inside <blockquote> -- which I believe to be correct (and necessary to fix bug 15491), but caution is warranted.
*** Bug 52763 has been marked as a duplicate of this bug. ***
Change 78972 merged by jenkins-bot: Make line breaks in <blockquote> behave like <div> (bug 6200). https://gerrit.wikimedia.org/r/78972
(In reply to comment #76) > Change 78972 merged by jenkins-bot: > Make line breaks in <blockquote> behave like <div> (bug 6200). > > https://gerrit.wikimedia.org/r/78972 Patch got merged, can this bug report be closed as RESOVLED FIXED?
I've removed the workarounds from the Quote and Bq templates on en.wiki.
Looks like there are still some issues: https://en.wikipedia.org/wiki/User:Kaldari/Bug6200
FWIW, the new behavior does seem to match the behavior of div tags. However the behavior is somewhat wonky. Specifically, whether or not there is a newline preceding the closing div or blockquote tag determines whether or not the final chunk of text is put in a paragraph tag. There may be a good reason why this is the case, but I don't know what it would be.
There's also a second issue: If the blockquote only contains 2 chunks of text separated by 2 linebreaks, the linebreaks are completely ignored. This also matches the behavior of div tags, but in neither case does it make sense.
(In reply to comment #79) > Looks like there are still some issues: > https://en.wikipedia.org/wiki/User:Kaldari/Bug6200 Isn’t this just the generic Bug 5718 (Block element written inline splits multiline paragraphs)?
(In reply to comment #79) > Looks like there are still some issues: > https://en.wikipedia.org/wiki/User:Kaldari/Bug6200 This is existing paragraph wrapping behavior when block tags (opening/closing) are present on the same line as plain text. https://git.wikimedia.org/blob/mediawiki%2Fcore.git/bc97a03f06bfb4bfd33cd2ec2a127c2f607e34e2/tests%2Fparser%2FparserTests.txt#L1200 This is somewhat odd behavior, of course.
Yeah, it's a side effect of the old parser handling tags on a line-by-line basis. But I think it's too established a behavior to change now, and Parsoid is backwards-compatible with this behavior. If someone wants to spend some quality time with a wikitext dump we could probably determine exactly how much wikitext depends on the current "odd" behavior. In any case, since the behavior is now the same as for <div>, I think we can resolve this bug as fixed.