Last modified: 2014-09-23 23:53:32 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 6569 - Avoid nested definition lists
Avoid nested definition lists
Status: NEW
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: Low normal with 1 vote (vote)
: ---
Assigned To: Gabriel Wicke
: need-parsertest, newparser, patch, patch-reviewed
: 11894 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-07-06 12:37 UTC by Shtriter Andrew
Modified: 2014-09-23 23:53 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Improved Parser.php - treats 2nd semicolon as literal (144.61 KB, text/plain)
2006-07-06 12:45 UTC, Shtriter Andrew
Details
Patch that applies the above change (562 bytes, patch)
2007-08-31 12:22 UTC, Dan Collins
Details

Description Shtriter Andrew 2006-07-06 12:37:41 UTC
The nesting of definition lists like ";; x :: y" produces awful html. 
Moreover, the  parser outputs different html for 2 dls with the common structre. 
The only difference between these lists is thet one of them is single-line and the other 
is not.
The simple example:
 ;; x :: y
 
 
 ;; x
 :: y

Output:
 <dl><dt> x&nbsp;</dt><dd><dl><dt></dt><dd> y
 </dd></dl>
 </dd></dl>
 <p><br>
 </p>
 <dl><dt></dt><dl><dt> x
 </dt><dd> y
 </dd></dl>

IMHO, single-line dl parcing is not quite right. The emply <dt><dt> should stay before 
'<dt> x&nbsp;</dt>', like in multi-line variant.

I've discussed the problem on #mediawiki. TimStarling suggested  to treat the second 
semicolon as literal semicolon. It can be archived by adding new line:
			
 $oLine = preg_replace( '/;(;)+/', ';<nowiki>$1</nowiki>', $oLine );
			
after

 $preOpenMatch = preg_match('/<pre/i', $oLine );

PS. If there are more the one colon on the line (like in the first example), all colons, 
starting from 2nd will be also treaten as literals. Cause "; x :: y" acts in the same 
way.
Comment 1 Shtriter Andrew 2006-07-06 12:45:40 UTC
Created attachment 2053 [details]
Improved Parser.php - treats 2nd semicolon as literal

Solves the problem of nested definition lists as described in the bug #6569.
Comment 2 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-07-17 04:46:58 UTC
Please include patches as diffs, not as entirely new files.  To apply your
changes, the devs would have to guess at what version you were working from,
diff them themselves, and only then would they be able to apply the diff to the
current version.
Comment 3 Dan Collins 2007-08-31 12:22:13 UTC
Created attachment 4063 [details]
Patch that applies the above change

patch for r25328
Comment 4 Gabriel Wicke 2011-11-10 14:14:47 UTC
Single-line definition lists handling currently appears to be wildly inconsistent:

http://www.mediawiki.org/wiki/User:GWicke/Definitionlists

My personal preference would be to treat a '; x : y' pair as a syntactic unit, so that

*; bla : blub

produces

<ul>
<li>
<dl>
<dt>bla&#160;</dt>
<dd>blub</dd>
</dl>
</li>
</ul>

and 

*; bla :: blub

results in

<ul>
<li>
<dl>
<dt>bla&#160;</dt>
<dd>: blub</dd>
</dl>
</li>
</ul>

This would make it different from

*; bla
:: blub

which imo should result in

<ul>
<li>
<dl><dt>bla&#160;</dt></dl>
</li>
</ul>
<dl>
<dd><dl><dd>blub</dd></dl>
</dl>

to stay consistent with general nested-list handling. This is also how lists are currently interpreted in the prototype PEG parser and HTML serializer we are currently working on: http://www.mediawiki.org/wiki/Future/Parser_plan.
Comment 5 Sumana Harihareswara 2011-11-10 14:24:46 UTC
From IRC conversation with Gabriel just now -- the patch might be technically fine, but it appears to be inconsistent with general nested list behaviour, and Gabriel it makes more sense to treat ; bla : blub as a unit.  So the patch needs more discussion on https://lists.wikimedia.org/mailman/listinfo/wikitext-l .  It could be that this patch is obviated by the new parser being developed ( https://www.mediawiki.org/wiki/Future ).
Comment 6 Gabriel Wicke 2011-11-10 15:33:45 UTC
Adding the newparser keyword so we keep this issue in mind for it.
Comment 7 Gabriel Wicke 2011-11-14 13:45:23 UTC
Additional information from http://lists.wikimedia.org/pipermail/wikitext-l/2011-November/000483.html. Nested definition lists are rare enough to allow us to decide on a new standard without breaking too many pages:

> Can we deconstruct the current parser's processing steps and build a set
> of rules that must be followed?

I think the commonly-used structures are quite clearly defined, but the
behaviour of these strange permutations is quite unspecified. The parser
output for the case reported in the bug already changed in the meantime..

> I think we need to get a dump of English Wikipedia and start using a
> simple PEG parser to scan through it looking for patterns and figuring
> out how often certain things are used - if ever.

I just ran an en-wiki article dump through a zcat/tee/grep pipeline:

pattern			count		example
------------------------------------------------------------------
^			548498738 	(total number of lines)
^;			681495
^;[^:]+:		153997		; bla : blub
^[;:*#]+;[^:]+:		3817		*; bla : blub
^;;                     2332
^[:;*#]*;[^:]*::        41		most probably ;::
^[;:*#]*;[^:]+::	17		;; bla :: blub

Nested definition lists are not exactly common. Lines starting with ';;'
often appear as comments in code listings. The most common other
application appears to be indentation and emphasis. Any change in the
produced structure that keeps indentation and bolding should thus avoid
breaking pages.
Comment 8 Sumana Harihareswara 2012-01-23 19:55:06 UTC
(In reply to comment #7)
Dan, I'm marking this patch reviewed per Gabriel's comments; it would be great if you could reply, revise, and resubmit.  Thanks!
Comment 9 Gabriel Wicke 2012-06-27 14:34:18 UTC
*** Bug 11894 has been marked as a duplicate of this bug. ***
Comment 10 Gabriel Wicke 2012-06-27 14:40:09 UTC
We added several parser tests documenting Parsoid's behavior in parserTests.txt, but disabled them for the PHP parser for now. Please test the patch against those. The expected output might need whitespace adjustment to match the PHP parser output. The Parsoid parser test runner renormalizes whitespace, so should still pass after those changes.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links