Last modified: 2014-11-04 22:52:51 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T6521, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 4521 - Colon (:) & semicolon (;) shouldn't output as HTML definition list when used for indentation, boldfacing
Colon (:) & semicolon (;) shouldn't output as HTML definition list when used ...
Status: NEW
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
1.6.x
All All
: Low minor with 14 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://en.wikipedia.org/wiki/User:Jor...
: newparser
: 4522 (view as bug list)
Depends on:
Blocks: semantic-html html5
  Show dependency treegraph
 
Reported: 2006-01-07 18:24 UTC by Joris Gillis
Modified: 2014-11-04 22:52 UTC (History)
14 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Joris Gillis 2006-01-07 18:24:05 UTC
Wikipedians often use lines starting with one or more colons to indent a line:
e.g.
: this line is indented
:: double indented

I recently noticed that the generated HTML contains a definition list (<dl/>).
Obviously this is not a good means to indent text. (http://www.w3.org/TR/html401/struct/lists.html#h-
10.3)
Apparently, the mediawiki-syntax for defintion lists (;term : def) is being abused for visual effects.

I suggest that a <div style="margin-left: 2em"></div> is used instead of <dl><dd></dd></dl>, whenever 
a colon is found without a preceding semicolon.


keywords: indentation colon semicolon dl dt dd definition list abuse parser
Comment 1 Brion Vibber 2006-01-07 19:03:58 UTC
*** Bug 4522 has been marked as a duplicate of this bug. ***
Comment 2 Joris Gillis 2006-01-07 21:25:03 UTC
Example: http://en.wikipedia.org/wiki/User:Joris_Gillis/SF
Comment 3 Antoine "hashar" Musso (WMF) 2006-01-17 19:26:03 UTC
well we can either:
- add a new syntax token like '>'
- accept using <dd> for indentation


> this is indented text of some sort
> also used in email for quoting!
Comment 4 Brion Vibber 2006-01-17 19:39:48 UTC
Lowering priority; the system works fine, it's just arguably not "semantically 
correct".
Comment 5 Joris Gillis 2006-01-17 19:59:47 UTC
I oppose the suggested new syntax. The ':' syntax is way too wide-spread. 
Neither will accept <dd> for indentation.

I think there's an another option: just let the wiki-syntax interpreter logically seperate 
cases A) when ':' is used together with ';' and B) when ':' serves indentation purposes.
Comment 6 lɛʁi לערי ריינהארט 2006-01-19 12:38:38 UTC
Hallo!

Please remember
Bug 2020: BiDi related issues to ";", ":", "#", and "*", monobook skin etc.

Only some of the identation variants can be used at RTL projects
([[meta:BiDi_workgroup]]) by modifying MediaWiki:Monobook.css manualy.

Please note that many of the RTL or BiDi projets do not have sysops at this moment
([[meta:BiDi_workgroup/To-do#BiDi_projects_without_sysop]]).


If there is a posibility to use another implementation / coding of the page
output this would be a great help. The page source / the wiki syntax should
preserve.

It might not be easy to find the solution imediately because varius bugs in
different browsers. This has to checked in adition to the compliance to one or
another standard / specification.

*if this does not relate here please open another bug*
I am not familiar in what MediaWiki php file Monobook.css clases are defined. In
my opinion both the LTR and the RTL  clases should be available by default
because BiDi projects as thouse in Kurdhish, Ladino etc. would need to use both
"environments" / would contain pages both with LTR and RTL sections. This
applies also to commons:, meta: and to any wiki.

best regards reinhardt [[user:gangleri]]

P.S. I am not shure if bug 2020 is depending on this bug. If this is the case
please add it to "blocks".
Comment 7 Michael Zajac 2006-02-01 22:12:45 UTC
It should also be mentioned that colons are widely misused to format block quotations in articles.  

If there were ever some major rehabilitation of wikitext syntax, definition lists should be reformatted to display 
without any indentation (bold text sufficing to reflect the intended use), and new syntax introduced for block quotes 
and nested discussion.  

Alternately, definition lists could be made to display differently in articles and discussion pages.
Comment 8 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-05-30 06:05:51 UTC
I would suggest that

:foo

translate to 

<div style="left: 2em;">
  foo
</div>

(obviously the exact distance is negotiable).  "::foo" would obviously just be
the same thing, only nested:

<div style="left: 2em;">
  <div style="left: 2em;">
    foo
  </div>
</div>

:foo
::bar

should be

<div style="left: 2em;">
  foo
  <div style="left: 2em;">
    bar
  </div>
</div>

And

::foo
::bar

should be

<div style="left: 2em;">
    <p>foo</p>
    <p>bar</p>
</div>

From observation, this last is pretty universally what people want when they
enter two things on the same line; they don't want a simple line break (which is
what UAs would probably render for two successive <div>s, and what they
certainly render with the present <dl>), they want a paragraph break.
Comment 9 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-05-30 06:07:45 UTC
Sorry for the double-post, just to say: I'm not sure if that "left" should
actually be "left-margin", but whichever, you get the point.
Comment 10 Michael Zajac 2006-05-31 16:35:07 UTC
A couple of potential problems with rendering colons as DIVs:

1. There are many articles where colons are actually used as definition lists, using 
semicolons and colons.

2. The HTML should somehow reflect the nested structure of a discussion page, for the 
sake of non-visual or text-only browsers.  Nested definition lists is actually not so bad at 
this (the lack of definition terms is odd but valid, but unfortunately wikitext usually 
screws it all up by closing and reopening the list after each item) -- alternatively, 
discussion could be seen as an unordered (ordered?) list of statements.
Comment 11 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-01 02:18:30 UTC
(In reply to comment #10)
> 1. There are many articles where colons are actually used as definition lists,
using 
> semicolons and colons.

Certainly, any solution will have to handle that.

> 2. The HTML should somehow reflect the nested structure of a discussion page,
for the 
> sake of non-visual or text-only browsers.  Nested definition lists is actually
not so bad at 
> this (the lack of definition terms is odd but valid, but unfortunately
wikitext usually 
> screws it all up by closing and reopening the list after each item) --
alternatively, 
> discussion could be seen as an unordered (ordered?) list of statements.

Well, nested <div>s represent it as nested just as well as nested <dl>s do,
don't they?  Compare (* used for indentation purposes only, it wouldn't be in
the actual HTML):

<div style="left: 2em;">
**foo
**<div style="left: 2em;">
****bar
**</div>
**baz
</div>

<dl>
**<dd>foo</dd>
**<dl>
****<dd>bar</dd>
**</dl>
**<dd>baz</dd>
</dl>
Comment 12 Michael Zajac 2006-06-01 05:40:53 UTC
Well, div elements are semantically void, and are used for grouping other elements only.  Definition lists imply a 
semantic meaning and relationship between their parts.
Comment 13 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-01 05:59:19 UTC
(In reply to comment #12)
> Well, div elements are semantically void, and are used for grouping other
elements only.  Definition lists imply a 
> semantic meaning and relationship between their parts.

Yes, but a semantic meaning complete different from the actual meaning intended.
 That's significantly worse than no meaning at all.  Alternatives:

* <ul> would work.  However, it would be redundant with * unless we use
"list-style-type: none;", which has near-zero support at present as far as I can
tell.  Or else we could use "list-style-image:" with a blank image; is that
property widely supported?

* We could use <blockquote>.  This is possibly the most sensible, really.

Ultimately, it should be remembered that users *aren't* entering the indentation
with any universal semantic meaning; they're entering them for the indentation.
 So semantically speaking, presentational markup is probably the best thing to
translate any wikimarkup into.  (Remember when '' used to be <em>, and '''
<strong>?)
Comment 14 Joris Gillis 2006-06-01 18:06:57 UTC
I totally support the blockquotes.
Since they are block-level containers, they can be nested.

<blockquote>
	<p>foo</p>
	<blockquote>
		<p>bar</p>
		<blockquote>
			<p>baz</p>
		</blockquote>
	</blockquote>
</blockquote>

Comment 15 omniplex 2006-06-30 13:40:28 UTC
The behaviour of colon / semicolon should stay as is.
Div-style hacks won't work without CSS, and each colon
a new paragraph makes no sense, for that effect I can
simply add an empty line:

: indented line
: indented line (same para)

: new indented parahraph. Cheers.
Comment 16 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-30 18:21:34 UTC
(In reply to comment #15)
> The behaviour of colon / semicolon should stay as is.
> Div-style hacks won't work without CSS

Which is a problem for all fifty people who don't use CSS . . . how does Lynx
render <dd> anyway?

> and each colon
> a new paragraph makes no sense, for that effect I can
> simply add an empty line:

Which works until you're working on a second-level list, and the underlying list
is a UL or OL.  Try the following:

#Number
#:Comment

#:Comment
#Another number

*Bullet
*:Comment

*:Comment
*Another bullet

They don't work too well.  Most people, I've observed, just make do with a
linebreak when it's necessary, whereas I pedantically insert <p>s manually to
break my comments into paragraphs.  But that issue is minor, I suppose.
Comment 17 Michael Zajac 2006-06-30 21:47:09 UTC
>> Div-style hacks won't work without CSS

> Which is a problem for all fifty people who don't use CSS

Divs are semantically meaningless.  Using divs + css to create so-called 
structure on a page is just inadequate, 1999-style markup, unworthy of 
open software Wikimedia or a forward-looking project like Wikipedia.

For some perspective, see "level 4" at both:

  Levels of CSS knowledge
  <http://friendlybit.com/css/levels-of-css-knowledge/>

  Levels of HTML knowledge
  <http://www.456bereastreet.com/archive/200605/
levels_of_html_knowledge/>
Comment 18 Aryeh Gregor (not reading bugmail, please e-mail directly) 2006-06-30 21:56:55 UTC
(In reply to comment #17)
> Divs are semantically meaningless.  Using divs + css to create so-called 
> structure on a page is just inadequate, 1999-style markup, unworthy of 
> open software Wikimedia or a forward-looking project like Wikipedia.

I'm well aware of the benefits of semantic markup.  The problem is, 95% of our
editors are not, and they're most of the ones who are using things like :.  If
you use <dd> or <blockquote> to indent it, you're saying that the item is a
definition or a quote, when nine-tenths of the time it isn't.  It's usually a
comment in a discussion, or otherwise just something that someone wants to indent.

To put it another way: users are entering content presentationally, not
semantically, and adding probably-false semantic meaning to their solely
presentational input is much worse from a semantics perspective than admitting
that, in fact, there are no semantics to the input.  Genuine semantics is good;
calling all indentation "blockquote" so that you don't have to use the dreaded
meaningless div is pointless from a semantic perspective.  <div class="indent">
is much more sensible and honest, because that's what it was entered as.
Comment 19 Michael Zajac 2006-06-30 22:06:15 UTC
Yes, but this bug and follow-up comments are an attempt to improve the situation in some way, not just throw 
up our hands and give up.  A definition list may not be perfect for discussions, but at least it expresses the 
nested structure.  Switching to divs would be discarding even this and turning the page to text soup.

It would be nice if wikitext markup for blockquotes existed.  We may be stuck with definitions and discussion 
threads being conflated, but maybe we can come up with some bright ideas to fix that.
Comment 20 Omegatron 2006-12-19 19:52:53 UTC
If I remember correctly, the HTML spec for definition lists is pretty loose,
allowing them to be used for plays, for instance.  Using them for talk pages is
not semantically wrong.  

Using them for indentation is.
Comment 21 Michael Zajac 2006-12-20 00:18:18 UTC
Dialogues is given as an example of other applications of definition lists in 
the specification, but this still implies a term-definition relationship 
between the parts.  For example:

<dl>
<dd>John</dd><!-- a label for John's statements -->
<dt>Hi, how are you?</dt><!-- defines what John said -->
</dl>

Nested discussion in Wikipedia is a list of definition lists containing 
definitions, but no defined terms, so I don't think it really makes semantic 
sense as a definition list in the same way as a script.  However, the nested 
lists do imply a hierarchy, and the default indented formatting of 
definitions does imply the same hierarchy in almost any text-only or 
graphical web browser (don't know how well it works in screen readers).  

Changing this formatting to divs would eliminate both the semantic and 
visual relationship completely.  This would make things worse, and not 
acceptable, even if CSS was added to make it look the same in graphical 
browsers.

Technically, the DTD allows a DL to contain only one or more terms, or 
only one or more definitions, so there is no problem with validation.

So although using colons for threaded discussion is semantically odd, it 
does work visually and semantically.  The more I think about it, the more 
comfortable I am with it.  Since it is easy to type and is firmly entrenched 
in Wikipedia talk pages, I suggest closing this bug.

----

Tangentially-related issue:

However, the indented display of definition lists is actually rather 
unsuitable for definition lists in articles—it breaks up the left-hand vertical 
line of text and looks sloppy.  In countless instances, editors enter 
wikitext like the following instead, where a definition list would be a 
perfect semantic fit:

'''Term'''<br />
Definition

So, when both Bug 6200 (Linebreaks are mishandled in <blockquote> and 
<li>) and Bug 4827 (blockquote support in wikitext) are fixed, so that 
there is no longer any incentive to abuse definition lists for block 
quotation formatting in articles, then the common style sheet ought to be 
updated to not indent definitions, in articles only.

----

References:

Introduction to lists
http://www.w3.org/TR/html401/struct/lists.html#h-10.1
"Definition lists, created using the DL element, generally consist of a series 
of term/definition pairs (although definition lists may have other 
applications)."

Definition lists: the DL, DT, and DD elements
http://www.w3.org/TR/html401/struct/lists.html#h-10.3
"Definition lists vary only slightly from other types of lists in that list items 
consist of two parts: a term and a description."

In the DTD for XHTML 1.0 transitional
http://www.w3.org/TR/xhtml1/dtds.html#dtdentry_xhtml1-
transitional.dtd_dl
"<!ELEMENT dl (dt|dd)+>"
Comment 22 Omegatron 2006-12-20 00:49:59 UTC
(In reply to comment #21)
> Dialogues is given as an example of other applications of definition lists in 
> the specification, but this still implies a term-definition relationship 
> between the parts.

They also give an example of a recipe, where the DTs function almost like
headers, and the DD contains another list or paragraph.  I think that when they
say "although definition lists may have other applications", they mean it quite
liberally.  I don't think using them for threaded discussions is semantically
wrong, regardless of the fact that there are no DTs.

> Since it is easy to type and is firmly entrenched 
> in Wikipedia talk pages, I suggest closing this bug.

That would imply that the bug is only about using definition lists for threaded
discussions, but the title of the bug is about indentation, which *is* still a
problem in articles.  Things like indenting math formulas and the like.

> However, the indented display of definition lists is actually rather 
> unsuitable for definition lists in articles—it breaks up the left-hand vertical 
> line of text and looks sloppy.

I think it looks good.  :-)  I convert things like the example you gave to
definition lists whenever I see them.
 
> So, when both Bug 6200 (Linebreaks are mishandled in <blockquote> and 
> <li>) and Bug 4827 (blockquote support in wikitext) are fixed, so that 
> there is no longer any incentive to abuse definition lists for block 
> quotation formatting in articles, then the common style sheet ought to be 
> updated to not indent definitions, in articles only.

I disagree.  We should figure out what people are semantically trying to do when
they use DDs for indentation, and provide markup and CSS to provide the same
effect the correct way.  They use it for blockquotes, so we have the
<blockquote> tag instead.  They use it for indentation of disambig links, so we
should build the indentation into the dablink class and remove the DD from the
template, etc.
Comment 23 Jools Wills 2008-08-20 12:47:04 UTC
This one is really bugging me. I want to use definition lists for their real purpose and have them styled so they are inline and with slightly different default margins etc

Title: Definition
Title: Definition

but this of course messes up everywhere that : has been used for indentation. Having it so : :: ::: is parsed differently would be great. I would be quite happy for it to use <div class="indent"> or so

of course I can wrap my inline definition lists in a div and style it like that, but it seems like overcoming the problem in the wrong way

And it just seems wrong to use definition lists for indentation anyway.

Is it possible to parse

;title:defintion
and
;title
:definition

differently from

:indent
::indent

?
Comment 24 Aryeh Gregor (not reading bugmail, please e-mail directly) 2008-08-20 14:47:31 UTC
In principle, sure.
Comment 25 S. McCandlish 2008-09-05 20:36:10 UTC
In response to an older comment, it isn't "arguably" semantically incorrect, it IS semantically incorrect. I don't care which of the proposed solutions is implemented, as long as its use for simple indentation renders as CSS not definition lists by the time it hits the user agent.  Web markup semantics are important for accessibility reasons, among others.
Comment 26 Brion Vibber 2008-12-30 02:07:12 UTC
Changing summary to reflect the direction of attack we would actually follow.
Comment 27 S. McCandlish 2008-12-31 20:45:17 UTC
Generally I like the way this is heading.  My 2 cents:

1) General-purpose indentation should be done with divs.
2) Blockquotes should be used for quotations, not general indentation.
3) Definition lists should be used when the editor is intentionally demonstrating a relationship between the two parts (term + definition, character + dialogue, etc.).
4) Ul/ol lists should not be used for things that are not actually lists.

And I concur strongly that that the present behavior of using dl/dt/dd lists for presentational indentation IS semantically invalid (not "arguably"), and that this is important to fix for accessibility and other reasons.
Comment 28 S. McCandlish 2010-06-12 21:55:52 UTC
FYI: While fixing this bug would be very helpful, I have to point out that there are quite a few other definition list problems in MediaWiki, as shown by some simple test cases:

http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style_(glossaries)/DD_bug_test_cases

I'll probably file this as another bug (or more - there may be more than one issue here) after further investigation, and notify this Cc loop of the new bug number. But I think this may be something to do with MW's weird handling of lists more generally.

PS: For anyone pulling their hair out with vagaries of ";" and ":" markup, just use the real HTML tags, and the problems melt away. But don't mix-and-match.
Comment 29 S. McCandlish 2010-07-25 21:10:28 UTC
Updating bug title to reflect related problem.  Colon is output as a definition list definition (dd element), and semicolon, often abused for boldfacing and creating pseudo-headings, outputs a definition list term (dt element).  Both of these should be replaced with CSS, at least if they are not in an actual definition list.  There are three ways to handle this:

1) Stop connecting this wikimarkup in any way to definition lists (which would have to be HTML-coded manually, like blockquotes and various other things that MediaWiki doesn't have special wikimarkup for).

2) Have the parser test the conditions of the markup, such that if the material is formatted like:

 ;A1
 :A2
 ;B1
 :B2

it is treated as a definition list, but if it has blank lines between any of these, or a ; without one or more :'s or vice versa, or otherwise doesn't fit this pattern, treat it as CSS-styled, non-list text.

3) Always treat this markup as CSS-style regular prose, unless it is inside an explicit HTML dl element, in which case always treat it as a definition list (regardless of whitespacing and regardless of missing definitions or terms).

4) Or some combination of these.  I'm marginally against option 1, and feel that 3 should usually apply (always apply in the case of explicit dl markup), but can't see anything wrong with MW doing some very limited guesswork as in option 2.
Comment 30 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-07-26 13:13:25 UTC
I'd prefer (2).  There are plenty of times I've used this for an actual definition list, and almost all of the abuses I've seen are cases where ";" isn't used at all, which is very easy to detect automatically.  We should just treat ":" without accompanying ";" as <div class="indent"> or something, that's enough to fix the large majority of the cases.
Comment 32 cogden1970 2011-05-10 23:40:50 UTC
I just want to register my strong agreement with #2. I think there is a need for the association list markup, and that it should not be difficult to separate out sole : lines, or ; lines without one or more corresponding :s, for special treatment as <div>s.

Will this be addressed in Brion Vibber's parser rewrite?
Comment 33 Krinkle 2011-05-10 23:43:31 UTC
(In reply to comment #31)
> That would work for me too, provided that the corresponding case of ";" without
> accompanying ":" be treated as a div with class="indent boldface" or whatever,
> so that both abuses of def. list markup are fixed.  PS: If you find that, when
> you are actually using ";" and ":" for def. lists, that you can't get the
> layout you want, try using explicit dl, dt, dd markup, and the problems go away
> (see [[WP:MOSGLOSS]] for details).
> 
>

(I'm quoting this message by S. McCandlish, posted on 2010-07-28 21:45:46 UTC becuase his original reply contained appended spam, which is the reason I hid the reply from view)
Comment 34 Sumana Harihareswara 2011-11-08 21:45:37 UTC
> Will this be addressed in Brion Vibber's parser rewrite?

I'm adding the newparser keyword to bring it to the parser rewriters' attention.
Comment 35 Helder 2012-07-19 03:32:06 UTC
On enwiki's village pump[1] it was mentioned that LQT would solve the issue, at least for talk pages (but since it is a dead project[2], well...)

[1] https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_73#Talk_page_comments_on_same_indent_level_are_not_vertically_separated_as_much_as_other_comments.E2.80.94bug.3F

[2] https://www.mediawiki.org/wiki/Talk:LiquidThreads_3.0#It.27s_dead.21
Comment 36 Helder 2012-09-10 18:52:16 UTC
The rendering of :'s was improved at least inside of <poem> tags to fix bug 31146 (see Gerrit change #13539).

Wouldn't be possible to do something similar here?
Comment 37 Emil Jerabek 2013-04-16 10:26:15 UTC
As was pointed out on http://en.wikipedia.org/wiki/Help_talk:Wiki_markup#semicolon_issue.3F , the <dl> markup currently produced by unpaired ; or : fails validation since the switch to HTML5. Shouldn't the importance of the bug be raised?
Comment 38 Gadget850 2013-04-16 13:12:44 UTC
Since this is now an HTML5 issue, I have added it to bug 19719.
Comment 39 Sam Martin 2013-04-16 16:21:18 UTC
As long as the software translates colons to definitions, the syntax should absolutely not be used in articles except when a definition list is what is actually desired. But since _everyone_ uses these or *lists for threads on Talk pages, could the HTML conversion be changed specifically for Talk pages?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links