Last modified: 2011-12-14 17:37:17 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T20765, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 18765 - Bold/italic markup handled differently depending on leading whitespace
Bold/italic markup handled differently depending on leading whitespace
Status: NEW
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
1.15.x
All All
: Low enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
http://test.wikipedia.org/wiki/Bug_18765
: newparser, patch, patch-need-review
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-05-11 12:26 UTC by Mark Clements (HappyDog)
Modified: 2011-12-14 17:37 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Parser change (6.81 KB, patch)
2009-09-27 21:16 UTC, Platonides
Details
Parser change (7.05 KB, patch)
2009-09-28 14:09 UTC, Platonides
Details
Full grabbing in regex (1.55 KB, patch)
2009-09-28 14:58 UTC, Platonides
Details

Description Mark Clements (HappyDog) 2009-05-11 12:26:02 UTC
The following markup gives different results from normal when inside table markup:

'''Look at ''this edit'''s more complicated bold/italic markup!'''

In normal text, you get:

'<i>Look at</i> this edit<b>s more complicated bold/italic markup!</b>

Within a table, you get:

<b>Look at <i>this edit'</i>s more complicated bold/italic markup!</b>

To me, the latter is the intended output, and from prior knowledge of the parser, what I would expect.  However the important point is that they are currently rendered differently, when they should not be!
Comment 1 Mark Clements (HappyDog) 2009-05-11 12:29:03 UTC
See URL for test-case.
Comment 2 Steve Sanbeg 2009-05-12 18:52:32 UTC
bold/italic has a fairly complex heuristic to determine how they match.  It's not dependent on tables, but it is (apparently) sensitive to whitespace.  Your test-case shows the difference one extra space in the line can make.  I don't think the table is relevant, other than how it causes the whitespace to render.
Comment 3 Brion Vibber 2009-05-14 23:48:43 UTC
Whitespace may affect things in order to ensure proper handling of the 's and l' sort of cases... but start-of-line and whitespace probably should look the same there.

Needs to be checked against the other test cases...
Comment 4 Mark Clements (HappyDog) 2009-05-15 00:18:19 UTC
Interestingly, I thought the parser used to format this kind of example in the manner described for when there is white-space at the the start, rather than the example without, however it now seems to use the non-white-space formatting as standard, with the white-space version only appearing in the described edge case.  This is what I was eluding to in the last para of my original post.  

Is there a possibility that this behaviour has changed in a parser update (which could have some serious implications), or is my memory just faulty?
Comment 5 Platonides 2009-09-25 12:32:01 UTC
It has worked this way since MediaWiki 1.3

MediaWiki 1.2 produce the same html for both cases, but in a third way:
<strong>Look at <em>this edit</em></strong><em>s complicated bold/italic markup!</em>
Comment 6 Mark Clements (HappyDog) 2009-09-25 13:41:10 UTC
OK - just a bit of faulty wiring then... damn this broken brain of mine! :-)
Comment 7 Platonides 2009-09-27 21:16:45 UTC
Created attachment 6589 [details]
Parser change

MediaWiki handles unbalanced quotes by looking at the different words length and doing a guess.

The test case showed several issues:
-MediaWiki treated the beginning of line as a multiletter word.
-Markup as <span> or | are treated as "words".

There's also the parser assumption that words are separated by spaces, which is not true for all languages.

The patch fixes just the first issue (plus parsertest and releasenotes).

Many usages now work, but 
<span>'''Look at ''this edit'''s complicated bold/italic markup!'''</span> 
and
{|
|'''Look at ''this edit'''s complicated bold/italic markup!'''
|}

Still fail, since it thinks <span> and | is text instead of markup. I don't think it's worth trying to instruct it that.


The behavior of parsertest "Mixing markup for italics and bold" changed, since it began the line with bold quotes.
I modified the rule "If there are more than 5 apostrophes in a row, assume they're all text except for the last 5." rule to make the 6 apostrophes produce the original <b>bold</b><b>bold<i>bolditalics</i></b>. It still spits single quotes to match open italic and bold but general behavior seems closer to what a human would expect. See the new 'Six quotes' parsertest for all the cases.
Comment 8 Platonides 2009-09-28 14:09:12 UTC
Created attachment 6595 [details]
Parser change

Fix the heuristic for the case with six quotes.
Added another parsetest for that.
Comment 9 Platonides 2009-09-28 14:58:40 UTC
Created attachment 6596 [details]
Full grabbing in regex

Accumulative patch to move the quote grabbing logic from php code to the regex.
It doesn't change the parser behavior, just the implementation.

The regex is faster than the php code, but the most fastened path is an uncommon one, and the regex is more complex. Needs benchmarking.
Comment 10 Platonides 2010-01-14 16:20:25 UTC
(In reply to comment #8)
> Fix the heuristic for the case with six quotes.
> Added another parsetest for that.

Committed in r61052

Comment 11 p858snake 2011-04-30 00:09:16 UTC
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links