Last modified: 2014-09-23 23:53:32 UTC
The nesting of definition lists like ";; x :: y" produces awful html.
Moreover, the parser outputs different html for 2 dls with the common structre.
The only difference between these lists is thet one of them is single-line and the other
The simple example:
;; x :: y
<dl><dt> x </dt><dd><dl><dt></dt><dd> y
IMHO, single-line dl parcing is not quite right. The emply <dt><dt> should stay before
'<dt> x </dt>', like in multi-line variant.
I've discussed the problem on #mediawiki. TimStarling suggested to treat the second
semicolon as literal semicolon. It can be archived by adding new line:
$oLine = preg_replace( '/;(;)+/', ';<nowiki>$1</nowiki>', $oLine );
$preOpenMatch = preg_match('/<pre/i', $oLine );
PS. If there are more the one colon on the line (like in the first example), all colons,
starting from 2nd will be also treaten as literals. Cause "; x :: y" acts in the same
Created attachment 2053 [details]
Improved Parser.php - treats 2nd semicolon as literal
Solves the problem of nested definition lists as described in the bug #6569.
Please include patches as diffs, not as entirely new files. To apply your
changes, the devs would have to guess at what version you were working from,
diff them themselves, and only then would they be able to apply the diff to the
Created attachment 4063 [details]
Patch that applies the above change
patch for r25328
Single-line definition lists handling currently appears to be wildly inconsistent:
My personal preference would be to treat a '; x : y' pair as a syntactic unit, so that
*; bla : blub
*; bla :: blub
This would make it different from
which imo should result in
to stay consistent with general nested-list handling. This is also how lists are currently interpreted in the prototype PEG parser and HTML serializer we are currently working on: http://www.mediawiki.org/wiki/Future/Parser_plan.
From IRC conversation with Gabriel just now -- the patch might be technically fine, but it appears to be inconsistent with general nested list behaviour, and Gabriel it makes more sense to treat ; bla : blub as a unit. So the patch needs more discussion on https://lists.wikimedia.org/mailman/listinfo/wikitext-l . It could be that this patch is obviated by the new parser being developed ( https://www.mediawiki.org/wiki/Future ).
Adding the newparser keyword so we keep this issue in mind for it.
Additional information from http://lists.wikimedia.org/pipermail/wikitext-l/2011-November/000483.html. Nested definition lists are rare enough to allow us to decide on a new standard without breaking too many pages:
> Can we deconstruct the current parser's processing steps and build a set
> of rules that must be followed?
I think the commonly-used structures are quite clearly defined, but the
behaviour of these strange permutations is quite unspecified. The parser
output for the case reported in the bug already changed in the meantime..
> I think we need to get a dump of English Wikipedia and start using a
> simple PEG parser to scan through it looking for patterns and figuring
> out how often certain things are used - if ever.
I just ran an en-wiki article dump through a zcat/tee/grep pipeline:
pattern count example
^ 548498738 (total number of lines)
^;[^:]+: 153997 ; bla : blub
^[;:*#]+;[^:]+: 3817 *; bla : blub
^[:;*#]*;[^:]*:: 41 most probably ;::
^[;:*#]*;[^:]+:: 17 ;; bla :: blub
Nested definition lists are not exactly common. Lines starting with ';;'
often appear as comments in code listings. The most common other
application appears to be indentation and emphasis. Any change in the
produced structure that keeps indentation and bolding should thus avoid
(In reply to comment #7)
Dan, I'm marking this patch reviewed per Gabriel's comments; it would be great if you could reply, revise, and resubmit. Thanks!
*** Bug 11894 has been marked as a duplicate of this bug. ***
We added several parser tests documenting Parsoid's behavior in parserTests.txt, but disabled them for the PHP parser for now. Please test the patch against those. The expected output might need whitespace adjustment to match the PHP parser output. The Parsoid parser test runner renormalizes whitespace, so should still pass after those changes.