Last modified: 2014-09-04 02:27:20 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T8104, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 6104 - Wrap each wiki page section contents in a container


Summary:	Wrap each wiki page section contents in a container

Status:	REOPENED

Product:	MediaWiki
Classification:	Unclassified
Component:	Interface (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Low enhancement with 3 votes (vote)
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:

Duplicates:	61615 70198 (view as bug list)
Depends on:
Blocks:	semantic-html css 22771
	Show dependency tree / graph

Reported:	2006-05-26 23:23 UTC by Andrew Dunbar
Modified:	2014-09-04 02:27 UTC (History)
CC List:	9 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Andrew Dunbar 2006-05-26 23:23:25 UTC

On a wiki page a "section" is a HTML heading element (H2, 
H3, etc) followed by text using any type of formatting, 
up to the next heading element. It generally consists of 
an "editsection" DIV if they are enabled, followed by a P 
element containing an anchor presumably for the TOC etc, 
then the content - which may contain subsections.

Now if each section was wrapped from start to finish in 
an HTML DIV it would make it much easier to implement 
such things as section-folding or giving various kinds of 
sections special colours. To do this now involves parsing 
the entire "bodyContents" and rebuilding it with such 
DIVs inserted - a slow and uncertain process.

Even better would be an outer DIV including the H tag and 
an inner DIV beginning after the H tag.

Another enhancement would be to auto-generate and ID for 
each section and subsection based on the contents of the 
owning H tag, perhaps including its parents in the case 
of subsections.

Comment 1 Brion Vibber 2006-05-26 23:28:13 UTC

This isn't practical with current system; sections may split up 
across table cells, etc.

Comment 2 Andrew Dunbar 2006-06-04 20:55:50 UTC

I'm reopening just to ask if a simple solution can be implemented in the short
term which does not attempt to surmount the problem Brion mentions above.
Specifically, the English Wiktionary and probably all Wiktionaries shouldn't
cause that problem but could make immediate use of section CSS.

On other wikis such a temporary solution should of course not be enabled.

Comment 3 Andrew Dunbar 2006-06-04 20:57:38 UTC

See also Bug 4741: Semantic HTML for section anchors

Comment 4 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-02-14 03:46:12 UTC

(In reply to comment #2)
> I'm reopening just to ask if a simple solution can be implemented in the short
> term which does not attempt to surmount the problem Brion mentions above.
> Specifically, the English Wiktionary and probably all Wiktionaries shouldn't
> cause that problem but could make immediate use of section CSS.

Some kind of check would still be needed to make sure this is possible. 
doHeadings() just uses a bunch of regex passes; it's not aware of contextual
stuff like whether there are table cells or whatnot nearby.  Then again, maybe
Tidy and/or Sanitizer would be clever enough to fix any resulting screwups
acceptably.  I suppose that would depend upon the exact styles and so on that
people would try to give it.  If we had a *proper* parser, of course, we could
presumably use a <tbody> instead of a div inside tables and properly nest it so
as to maintain validity, but that's not happening soon.

Comment 5 Andrew Dunbar 2007-02-25 02:32:14 UTC

I have a proof of concept of this running at http://wiktionarydev.leuksman.com
(Hit random - it's not on the main page)

So far it's modified core code though it only touches one function in
Parser.php. It doesn't seem to be compatible the normal TOC so I've disabled it.

Comment 6 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-07-05 20:48:29 UTC

Copying a relevant post I was going to make to wikitech-l but decided not to because it was off-topic:

Done naively, this breaks XHTML validity if the header is wrapped in any tag at all, and it seems very difficult to fix it for perfectly reasonable, nontrivial, legal cases like

This is the Declaration of Independence.
<div class="cited-document">
== Section 1 ==
We don't like the British.

== Section 2 ==
Therefore we're not going to be your colony anymore.
</div>
This is a very compelling and historical document.

Observe that there, the trailing text is not intended to be part of Section 2 even though MediaWiki would consider it as such at present. Possibly you could construct some algorithm that would figure this out, but it's not particularly easy to do, especially if the tag structure is not as reasonable as this (use your imagination!). Are we going to start rewriting the document structure when an algorithm doesn't think it makes any sense, even if it's valid XHTML? Further issues arise with tables, where <div> wrappers are illegal and you have to hope that you can fit a <tbody> around what you want.

I think section wrappers *could* be extremely useful, but when you get down to it, their utility is limited even in the abstract by the fact that not all text is required to be part of any section, except in the technical sense. It would take some fairly drastic overhauling of how we look at and deal with sections for section wrappers to be practicable.

One approach to solving cases like that would be to simply parse each section independently of all the others, and run Tidy and so on on each section separately. That would make scenarios like the above impossible. This would be totally unacceptable for Wikipedia, but if what you say is true, it might be reasonable for the main namespace of Wiktionary. There might be a way of marking a section as not needing a section div for some reason, too (cf. bug 6575).

Comment 7 Michael Zajac 2007-07-05 22:20:57 UTC

That's a good example of the kind of problem that can make this a sticky issue to resolve.

But it also shows why it ought to be resolved and why it blocks bug 10467 (Use semantic XHTML).  With the current wikitext parser, the example code infers an incorrect semantic interpretation for the document.  The HTML specification says "A heading element briefly describes the topic of the section it introduces", so "Section 2" is a heading introducing both the second part of the constitution and the article copy after it.  The author clearly did not intend this.

In (X)HTML 4, every heading implies a section which ends at the next heading of the same or higher level.  This bug proposes making that exact hierarchy explicit.  If we accept this, then I think there is a relatively simple solution.

The multi-section div element entered in wikitext explicitly creates a new section within the surrounding text (i.e. one level lower than the previous section heading).  Any section headings within that section imply enclosed sections, so they should be bumped down a further level in the hierarchy, and the last one closed before the closing /div tag. Following sections should resume the normal flow until the end of the document.

So the sample wikitext above implies the following structure, which ought to be rendered in the page's XHTML.  I've assumed the original had preceding and following sections, to show what could happen (they are unaffected).

  == Preceding section ==
  This is the Declaration of Independence

    === Editor-entered div/section === <div class=cited-document">

      ==== Section 1 ====
      We don't like the British.
      </div><!-- Section 1 ends -->

      ==== Section 2 ====
      Therefore we're not going to be your colony anymore.
      </div><!-- Section 2 ends: implied closure made explicit by the renderer -->

    </div><!-- editor-entered div/section closure  -->

  This is a very compelling and historical document.
  </div><!-- Preceding section ends -->

  == Following section ==
  American Revolutionary War follows.
  </div><!-- Following section ends -->


Unfortunately, it is impossible to duplicate this structure explicitly in wikitext only, since there is no way to end a section before the next equal or higher section (as happens at the end of Section 2 here).

Questions: 

* Do the automatically-generated sections get a heading or not?  If so, how is the text generated.
* Can this be logically extended to cover nested divs?  Or should the div hierarchy remain flat, with following div tags automatically close previous ones.
* What happens if div tags are not balanced?  Can authors enter only a closing </div> tag to end a subordinate section?

Comment 8 Michael Zajac 2007-07-05 22:22:19 UTC

Another option: such a div element could be considered mis-nested, and ignored by the wikitext renderer.

Comment 9 Aryeh Gregor (not reading bugmail, please e-mail directly) 2007-07-05 23:02:27 UTC

(In reply to comment #7)
> In (X)HTML 4, every heading implies a section which ends at the next heading of
> the same or higher level.

Not really.  My above example is reason enough to discard that.  Even if you add a heading for the whole Declaration, HTML provides no way to indicate that the ended <div> terminates the section.  It only says that user agents should be able to construct a table of contents automatically, which they can, and in fact MediaWiki does exactly that.  To use another counterexample, the final heading tag in the source of http://www.w3.org/ is the one entitled "Systems", yet it precedes the completely unrelated footer, which has no heading tag.

(Incidentally, the last draft of XHTML 2.0 that I looked at had some kind of tag to explicitly delimit sections, <section> or something.)

> The multi-section div element entered in wikitext explicitly creates a new
> section within the surrounding text (i.e. one level lower than the previous
> section heading).

Sure, but what about this template-generated table?

<table>
<tr colspan="2"><th>
== Widget sales for 2006 ==
<a href="...">edit this template</a>
</th></tr>
<tr><th>Month</th><th>Number</th>
...
</table>

This has the same form as the div example, but its semantics are different.  That is, the heading is cordoned off from the section by a parent element, but it does *not* logically cover only its following siblings (in this case only the <a> element), it covers the entire table, which includes cousin nodes and even parents.  How do you plan to automatically differentiate these cases?  You'd need explicit, user-entered section delimiters for this to work reliably.

Comment 10 Andrew Dunbar 2007-08-06 11:23:09 UTC

I've got a basic version of this working in JavaScript here: http://en.wiktionary.org/wiki/User:Hippietrail/addstructure.js

It is designed for and tested only on the English Wiktionary so far but is not installed there for all users.

It may however be of interest to anybody following this feature request.

Comment 11 Michael Zajac 2008-11-15 21:18:26 UTC

See also Bug 16190: Relate section anchors to section headings in HTML, describing an alternative which may be simpler to implement and provides some benefits.  Bug 4741: Use id's for section anchors instead of <a name=...> is similar to this one.

Comment 12 Michael Zajac 2011-01-05 18:28:47 UTC

If HTML were to be supported, then a better solution would be to use a <section> element. 

See also bug 23932 - “Enable, whitelist, and incorporate semantic HTML5 elements: article, aside, figcaption, figure, footer, header, hgroup, mark, nav, section, time.”

Comment 13 Bartosz Dziewoński 2014-09-03 12:02:35 UTC

*** Bug 61615 has been marked as a duplicate of this bug. ***

Comment 14 Bartosz Dziewoński 2014-09-03 12:03:31 UTC

*** Bug 70198 has been marked as a duplicate of this bug. ***

Comment 15 Derk-Jan Hartman 2014-09-03 12:32:24 UTC

Since we do some stuff in this area with mobile and parsoid/VE these days. I wonder, what if we do this only for H2's, is there any way we can measure how many pages we would break ?

Parsoid has done metrics on similar problems right ? Perhaps trough that route we could explore it ?

I do know that this:
<div class="cited-document">
== Section 1 ==
We don't like the British.

== Section 2 ==
Therefore we're not going to be your colony anymore.
</div>

is often used on user pages, so those would likely all break..

Comment 16 Isarra 2014-09-03 15:17:32 UTC

Have it activate for h1s and h2s unless they're embedded in something (another div that doesn't span the entire page, a table, etc), perhaps?

So each h1 and following content would get its own div, which would include the divs for h2s and their following content.

If you have a in-wikitext <div> around two h2s and content, this could either just ignore those, or put the first h2 div around both and just ignore the second h2... or perhaps put both h2+content divs inside the parent div.


Whatever the solution, this would be very useful or even needed on several projects. wikiHow comes to mind, considering how all the content on a howto is broken up into sections in just such a way.

Comment 17 Andrew Dunbar 2014-09-04 02:27:20 UTC

Eight years ahead of my time, apparently (-;
Glad to see others finally noticing some need for this!

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links