Last modified: 2014-11-17 10:34:48 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T26529, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 24529 - Incrementally remove support for HTML elements removed from or deprecated in HTML5
Incrementally remove support for HTML elements removed from or deprecated in ...
Status: NEW
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: Low enhancement with 3 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: html
  Show dependency treegraph
 
Reported: 2010-07-24 16:46 UTC by S. McCandlish
Modified: 2014-11-17 10:34 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description S. McCandlish 2010-07-24 16:46:49 UTC
Support for the HTML elements tt, s, strike and u should be removed.  Initially, they should be deprecated in the MW major wiki (Wikipedia, Wiktionary) documentation. Second, they should be replaced on the fly with font-styled spans by the MW engine before reaching the user agent. Third, they should eventually not be supported at all, after wider adoption of HTML5.

Previous discussion on the topic (with new material added at end), centralized here:

--- Bug #671 Comment #27 from SMcCandlish <smccandlish@gmail.com> 2008-09-19 21:34:22 UTC ---
We should also be aware that the tt, s, strike, and u elements are also going to be removed from XHTML 2 and HTML 5.  The time is probably NOW to start weaning people off of them, though of course they shouldn't simply be deleted from MediaWiki support just yet. It would be good if these were replaced on-the-fly with styled <span>s, though (meanwhile, <i>/'' and <b>/''' should be left alone, as HTML 5 redefines them more narrowly and they will continue to be
used).

--- Bug #671 Comment #41 from SMcCandlish <smccandlish@gmail.com> 2010-07-22 01:29:44 UTC   ---
[We] will ultimately need to ... get rid of support for the tt element entirely, which doesn't exist in HTML 5. Here's a good discussion of this issue (more broadly than wiki), and Googling about it turns up more:

http://lists.whatwg.org/htdig.cgi/help-whatwg.org/2009-April/000233.html

Salient quote:

Ian Hickson, Wednesday, 29 April 2009 6:44 AM:
> On Tue, 28 Apr 2009, Jim Garrison wrote:
>> I am trying to figure out the best way to replace the tt element as I
>> migrate to HTML5.
> 
> Are you using tt to mark up computer code, variables, sample computer 
> output, user input, for emphasis, to give a span of text in an alternate 
> voice or mood, a span of text to be stylistically offset from the normal 
> prose without conveying any extra importance, or something else?

This question must be asked every time a tt element is replaced (manually or
via AWB or whatever), and they WILL need to ultimately be replaced over the next
couple of years.

--- Bug #671 Comment #43 here from Aryeh Gregor <Simetrical+wikibugs@gmail.com> 2010-07-22 17:24:10 UTC (In reply to comment #41) --- 
> [The tt element] exists in HTML5.  It's just classified as obsolete
> presentational markup, and is not valid

Noted; thanks. HTML5 keeps changing and I stopped trying to track it all quite some time ago.  Being mentioned in the standard as *invalid* presentational markup is effectively the same thing as being "not in" the standard, however.  And it doesn't change my point about the tt element: The last thing we want is for WP and other wikis' content, whether served by those wikis or repurposed elsewhere, to fail validation out of laziness and cruft.  MW's tt still has to go, at least in the long run.

> We cannot remove
> support for [tt] from MediaWiki without a migration path
> to convert all existing markup somehow.  But this is a totally separate bug.

No argument from me on either observation. My point is that unless this *is* opened as a bug, it's highly unlikely that any such migration path will be devised (although it would be a near-trivial one anyway; a simple bot could convert these into a styled span, ignoring instances inside pre, nowiki and angle brackets coded as numeric or named character entity references.  A new bug for this one should not be set to RESOLVED LATER or no one will do anything to start migrating away from the dead markup. 

I would suggest that tt be removed from documentation as "supported", and noted as deprecated with all support for it eventually being removed. For several versions (maybe several years) it should be allowed it in wikicode (i.e., in the editing window and in saved wikicode that editors see in the editing window), but transmogrify it on the fly into a monospaced span before it is served to the user agent.  After HTML5 is more fully accepted, tt should just disappear.
Comment 1 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-07-25 17:36:28 UTC
I have no objection in principle to migrating away from using these somehow, so I agree that this bug should not be closed.  However, there has to be some migration plan that does not force Wikipedia users to do unnecessarily large amounts of work, and someone has to code it up.  These are obstacles that I suspect are prohibitive for the foreseeable future.  It's conceivable that we could do some auto-translation of <tt> to <span style="font-family:monospace"> and so forth, but that still leaves the invalid markup in the page source, so it's less than ideal.

Note that it's not just elements here, but attributes too.  For example, cellspacing="" and cellpadding="" are obsolete and invalid in HTML5 as well.  The full list is here:

http://www.whatwg.org/specs/web-apps/current-work/multipage/obsolete.html#non-conforming-features

We don't allow most of those anyway, but there are an awful lot we do currently permit.  Some of them (particularly table attributes) cannot be easily, reliably, and automatically converted to use CSS.
Comment 2 S. McCandlish 2010-07-25 20:04:21 UTC
The tt element and other simple cases can be fixed in the wikicode with AWB and other scripting tools and bots, after being fixed in engine to not actually reach the user agent as a tt element, but a span with monospaced font.  On installations other than Wikipedia, they'll need to write their own (or adapt WP's) tools, or fix it manually.

I'm not sure that allowing the tt element in the wikicode is a huge deal. We also allow br elements without a closing / in them, we allow the p element without a closing /p tag, etc., and it all gets fixed on the fly before it hits the browser. I think that tt should be removed from the editor-facing documentation, so that new instances of it are not added to the wikicode, willynilly, forever. The help pages on editing should direct users to a {{monospace}} template or something (which would use the span and font-family).

For HTML5-verboten attributes... Yeah, that'll be big fun.  I don't have any particular ideas with regard to table stuff especially.
Comment 3 Aryeh Gregor (not reading bugmail, please e-mail directly) 2010-07-25 20:17:24 UTC
(In reply to comment #2)
> The tt element and other simple cases can be fixed in the wikicode with AWB and
> other scripting tools and bots, after being fixed in engine to not actually
> reach the user agent as a tt element, but a span with monospaced font.  On
> installations other than Wikipedia, they'll need to write their own (or adapt
> WP's) tools, or fix it manually.

If people step up who are willing and able to fix all the breakage, I don't mind disabling support in the software.  But not before all existing uses are removed, and people commit to fixing any stragglers.

> I'm not sure that allowing the tt element in the wikicode is a huge deal. We
> also allow br elements without a closing / in them

Which is allowed in HTML5.

> we allow the p element without a closing /p tag

Which is allowed in HTML5.

> I think that tt should be removed from the editor-facing
> documentation, so that new instances of it are not added to the wikicode,
> willynilly, forever. The help pages on editing should direct users to a
> {{monospace}} template or something (which would use the span and font-family).

I agree, but of course, this isn't the right place to ask.  It's a wiki, change it.  :)  (Or get consensus, whatever.)

> For HTML5-verboten attributes... Yeah, that'll be big fun.  I don't have any
> particular ideas with regard to table stuff especially.

That's much harder, yeah.  Of course, it wouldn't be a big deal if people would just not use presentational tables, but good luck with that one . . .
Comment 4 The Evil IP address 2010-08-24 15:49:23 UTC
Why bother people with removing them from the wiki text? They're often much easier to write for the casual editor. Just compare

<s>#A striken out vote. ~~~~</s>

to

<span style="text-decoration: strike-through;">#A striken out vote. ~~~~</s>

Or <center> is a very easy way to center a text, <u> is helpful to underline text and the like. We shouldn't force users to learn HTML and CSS for their regular editing. They should remain supportable in the wiki text and the software should convert them to proper HTML 5. 

The real concern here should be to move the deprecated HTML elements and attributes outta the software generated text. I've recently requested to replace some <font> attributes in some messages at Translatewiki, and Siebrand then did it, but I'm pretty sure there's more such stuff. I can look for the MediaWiki messages with old CSS, if you want to. There may be some in core, and probably much more in the extensions.
Comment 5 S. McCandlish 2011-08-16 08:01:59 UTC
These concerns are not mutually exclusive, and really are part and parcel of the same thing.  The reasons to stop supporting obsolete <tt>-style stuff in wikitext are numerous.  The most obvious is that wikimarkup is not HTML. We allow some basic [X]HTML for experienced, geeky users, but there's not guaranteeing we'd do that forever and ever, and supporting BAD markup of this sort is just pointless. Second, it encourages sloppy coding everywhere. Wikipedia is the most, or one of the top three most, popular (depending on stats source; I tend to think that Facebook and GMail have it beat) websites in the world, so what we do actually has influence. We are lending extra "life after death" to dead code.  Third, code from WikiMedia projects like WP and Wiktionary can be re-used anywhere by anyone, and we have no control over how that is done. Just pasting stuff is surely pretty common, so bad code from WP is getting out "into the wild".  Not a huge concern, but we should at least only support valid markup if we're going to allow HTML at all. Fourth editor convenience is better served by templates.  Fourth, no one really expect users to have to enter stuff like <span style="text-decoration: strike-through;">...<span>, when something like {{strike|...}} would do this for them. And <s> a.k.a. <strike> is a bad example anwyay; it's pure presentation with no semantic meaning, thus its obsolescence.  The <del>...</del> markup, is still valid (I'll eventually ensure that {{del|...}} works at en.wp, too).  Fifth, we shouldn't force users to learn INCORRECT HTML and CSS for their regular editing, which is presently precisely the case.  Implemented the correct stuff, removed the bad, and give non-HTMLish users templates.
Comment 6 Gadget850 2013-05-12 21:54:27 UTC
The latest HTML spec obsoletes these elements that are allowed by sanitizer.php:
<big>
<center>
<font>
<rb>
<strike>
<tt>

These elements are supported by HTML5:
<s>
<u>
Comment 7 S. McCandlish 2013-06-09 04:19:52 UTC
The <small> element is missing from the list of obsoletes posted by Gadget850 in Comment #6, 2013-05-12 21:54:27 UTC.
Comment 8 Daniel Friesen 2013-06-09 04:40:50 UTC
(In reply to comment #7)
> The <small> element is missing from the list of obsoletes posted by Gadget850
> in Comment #6, 2013-05-12 21:54:27 UTC.

That's cause <small> is still valid, was never removed, and hence is not obsolete:
http://www.whatwg.org/html/text-level-semantics.html#the-small-element
Comment 9 Gadget850 2013-06-09 12:40:03 UTC
<small> keeps coming up in discussions as being obsolete- anyone know why? Perhaps a draft spec? The only change is that is now has a semantic definition.
Comment 10 Michael Zajac 2013-06-09 14:54:37 UTC
Small was obsoleted as a presentation element, and has since been reprieved.
Comment 11 C. Scott Ananian 2014-06-24 18:13:26 UTC
The Parsoid project is sponsoring a GSoC student to write "linttrap", a wikitext linter which can (hopefully) aid in the semi-automatic conversion of deprecated markup.
Comment 12 Technical 13 2014-06-25 10:52:08 UTC
May need to pick up the pace on this, mobile browsers are starting to drop these elements as can be seen in a simulated screenshot of what I see on my BlackBerry phone: [[:commons:File:Bad elements.png]].  I've been going through offering replacement signatures to those with the tags that have been removed and started cleaning up interface messages, templates, and help/project pages on enwp, but there are nearly 100K pages overall with these codes that render parts of pages invisible.  I'll keep plugging away at it, but any help I can get would be greatly appreciated.
Comment 13 C. Scott Ananian 2014-06-25 15:53:27 UTC
@Technical 13 -- it looks like this is actually a font issue with your phone.
Comment 14 Technical 13 2014-06-25 16:08:52 UTC
(In reply to C. Scott Ananian from comment #13)
> @Technical 13 -- it looks like this is actually a font issue with your phone.

Then it is just a coincidence that it affects all of the deprecated elements including <font>, <acronym>, <center>, <big>, <strike>, and <tt> (that I can see) and nothing else?
Comment 15 C. Scott Ananian 2014-06-25 16:18:33 UTC
Your posted image only demonstrates problems with <font> and <tt>.  The full list of deprecated elements is at http://www.whatwg.org/specs/web-apps/current-work/multipage/obsolete.html#non-conforming-features -- your phone has problems rendering all of these?
Comment 16 Technical 13 2014-06-25 16:36:21 UTC
(In reply to C. Scott Ananian from comment #15)
> Your posted image only demonstrates problems with <font> and <tt>.  The full
> list of deprecated elements is at
> http://www.whatwg.org/specs/web-apps/current-work/multipage/obsolete.
> html#non-conforming-features -- your phone has problems rendering all of
> these?

The ones I listed in [#c14], yes.
Comment 17 Jesús Martínez Novo (Ciencia Al Poder) 2014-06-25 19:02:12 UTC
(In reply to Technical 13 from comment #12)

That's really odd. When a browser drops support for a tag, just as any unknown tag, it should display the contents of the tag (not the tag markup itself), but not hiding its contents!

I'm also inclined to think it's a font problem.
Comment 18 Technical 13 2014-06-25 19:15:35 UTC
(In reply to Jesús Martínez Novo (Ciencia Al Poder) from comment #17)
> (In reply to Technical 13 from comment #12)
> 
> That's really odd. When a browser drops support for a tag, just as any
> unknown tag, it should display the contents of the tag (not the tag markup
> itself), but not hiding its contents!
> 
> I'm also inclined to think it's a font problem.

When I view the source of the page, the elements aren't even there.
Comment 19 Gadget850 2014-06-25 23:33:07 UTC
<rb> is no longer obsolete.
http://www.w3.org/TR/2014/CR-html5-20140429/text-level-semantics.html#the-rb-element

(In reply to Gadget850 from comment #6)
> The latest HTML spec obsoletes these elements that are allowed by
> sanitizer.php:
> <big>
> <center>
> <font>
> <rb>
> <strike>
> <tt>
> 
> These elements are supported by HTML5:
> <s>
> <u>
Comment 20 Gadget850 2014-06-25 23:35:31 UTC
<acronym> is not whitelisted, so the markup will always show.

(In reply to Technical 13 from comment #14)
> (In reply to C. Scott Ananian from comment #13)
> > @Technical 13 -- it looks like this is actually a font issue with your phone.
> 
> Then it is just a coincidence that it affects all of the deprecated elements
> including <font>, <acronym>, <center>, <big>, <strike>, and <tt> (that I can
> see) and nothing else?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links