Last modified: 2011-12-14 17:37:17 UTC
The following markup gives different results from normal when inside table markup: '''Look at ''this edit'''s more complicated bold/italic markup!''' In normal text, you get: '<i>Look at</i> this edit<b>s more complicated bold/italic markup!</b> Within a table, you get: <b>Look at <i>this edit'</i>s more complicated bold/italic markup!</b> To me, the latter is the intended output, and from prior knowledge of the parser, what I would expect. However the important point is that they are currently rendered differently, when they should not be!
See URL for test-case.
bold/italic has a fairly complex heuristic to determine how they match. It's not dependent on tables, but it is (apparently) sensitive to whitespace. Your test-case shows the difference one extra space in the line can make. I don't think the table is relevant, other than how it causes the whitespace to render.
Whitespace may affect things in order to ensure proper handling of the 's and l' sort of cases... but start-of-line and whitespace probably should look the same there. Needs to be checked against the other test cases...
Interestingly, I thought the parser used to format this kind of example in the manner described for when there is white-space at the the start, rather than the example without, however it now seems to use the non-white-space formatting as standard, with the white-space version only appearing in the described edge case. This is what I was eluding to in the last para of my original post. Is there a possibility that this behaviour has changed in a parser update (which could have some serious implications), or is my memory just faulty?
It has worked this way since MediaWiki 1.3 MediaWiki 1.2 produce the same html for both cases, but in a third way: <strong>Look at <em>this edit</em></strong><em>s complicated bold/italic markup!</em>
OK - just a bit of faulty wiring then... damn this broken brain of mine! :-)
Created attachment 6589 [details] Parser change MediaWiki handles unbalanced quotes by looking at the different words length and doing a guess. The test case showed several issues: -MediaWiki treated the beginning of line as a multiletter word. -Markup as <span> or | are treated as "words". There's also the parser assumption that words are separated by spaces, which is not true for all languages. The patch fixes just the first issue (plus parsertest and releasenotes). Many usages now work, but <span>'''Look at ''this edit'''s complicated bold/italic markup!'''</span> and {| |'''Look at ''this edit'''s complicated bold/italic markup!''' |} Still fail, since it thinks <span> and | is text instead of markup. I don't think it's worth trying to instruct it that. The behavior of parsertest "Mixing markup for italics and bold" changed, since it began the line with bold quotes. I modified the rule "If there are more than 5 apostrophes in a row, assume they're all text except for the last 5." rule to make the 6 apostrophes produce the original <b>bold</b><b>bold<i>bolditalics</i></b>. It still spits single quotes to match open italic and bold but general behavior seems closer to what a human would expect. See the new 'Six quotes' parsertest for all the cases.
Created attachment 6595 [details] Parser change Fix the heuristic for the case with six quotes. Added another parsetest for that.
Created attachment 6596 [details] Full grabbing in regex Accumulative patch to move the quote grabbing logic from php code to the regex. It doesn't change the parser behavior, just the implementation. The regex is faster than the php code, but the most fastened path is an uncommon one, and the regex is more complex. Needs benchmarking.
(In reply to comment #8) > Fix the heuristic for the case with six quotes. > Added another parsetest for that. Committed in r61052
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*