Last modified: 2014-10-09 19:02:11 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T29478, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 27478 - Enable $wgHtml5 on Wikimedia wikis
Enable $wgHtml5 on Wikimedia wikis
Status: RESOLVED FIXED
Product: Wikimedia
Classification: Unclassified
Site requests (Other open bugs)
unspecified
All All
: Normal enhancement with 5 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
: platformeng, shell
Depends on: 30525 36495
Blocks: html5 34475 38471 5398 31527
  Show dependency treegraph
 
Reported: 2011-02-16 23:06 UTC by Aryeh Gregor (not reading bugmail, please e-mail directly)
Modified: 2014-10-09 19:02 UTC (History)
30 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-02-16 23:06:02 UTC
$wgHtml5 is usable in 1.17 -- get rid of $wgHtml5 = false in the config files.  It was only there because the old 1.16 snapshot used by Wikimedia didn't produce well-formed XML.  Even the 1.16 release works fine.

The only possible negative side-effect would be that some pages might not be well-formed XML due to bugs, e.g., if there are named entities that have crept in since r68803.  These should be fixable easily on a case-by-case basis.  $wgHtml5 = true is the default since 1.16, so we want to test it on the main site and fix any resulting bugs even if there are any.
Comment 1 Mark A. Hershberger 2011-02-17 16:43:46 UTC
How could we find fixable items?  Would loading a dump and checking for pages that weren't well-formed work?

I'm interested b/c this tickles my inner XML nerd.
Comment 2 Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-02-18 00:01:07 UTC
The problem wouldn't be in pages, it would be in code.  The code should always still output well-formed XML, but there might be bugs.  Well-formedness errors from anything run through the parser should be a nonissue, but special pages or such that output stuff directly might sneak in some entities, in which case that page would break.

(Of course, you've always been able to get the parser to output non-well-formed output, even in non-HTML5 mode.  Some of the expected fails in parser tests exhibit this.)
Comment 3 Niklas Laxström 2011-02-18 08:08:15 UTC
I'm sure lots of entities have crept in, since nobody has been enforcing that (I've made few comments in code review).
Comment 4 Mark A. Hershberger 2011-02-18 15:49:23 UTC
(In reply to comment #3)
> I'm sure lots of entities have crept in, since nobody has been enforcing that
> (I've made few comments in code review).

Could you point to an example of this in code review so that I know what to look for?
Comment 5 Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-02-18 18:59:16 UTC
(In reply to comment #3)
> I'm sure lots of entities have crept in, since nobody has been enforcing that
> (I've made few comments in code review).

Should be fixed by and large in r82413.  Some corner cases will surely come up anyway in deployment, because of things like messages being output as raw HTML and admins putting entities in them, but it shouldn't be too hard to fix.  Worst that happens is some screen-scrapers temporarily break.  I originally discussed this with Brion and Tim and they agreed the extra long-term pain for screen-scrapers was an okay risk (maybe even a good thing!).
Comment 6 Roan Kattouw 2011-02-23 17:10:30 UTC
Enabled
Comment 7 Sam Reed (reedy) 2011-02-24 13:23:58 UTC
<logmsgbot> !log demon synchronized php-1.17/wmf-config/InitialiseSettings.php  'Turning HTML5 back off for now. Reports of breakage on zhwiki in Internet Explorer on XP. Also people are complaining about userscripts breaking, but its probably screen scraping (which people shouldn't be doing anyway and we've been saying for years)'
Comment 8 Sam Reed (reedy) 2011-02-24 13:24:43 UTC
See bug 27677 for zhwiki
Comment 9 Chad H. 2011-02-24 13:36:17 UTC
Turns out zhwiki issue is unrelated, oh well.
Comment 10 Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-02-24 15:01:49 UTC
Since the zhwiki issue is unrelated, can this be turned back on?  If we care about regex screen-scrapers for some reason, then relay the exact errors people are getting so I can take a look and see what I can do.
Comment 11 AlexSm 2011-02-24 16:09:22 UTC
You explained exactly the same error with scraping in 2009:
[[Wikipedia:Village pump (technical)/Archive 67#Twinkle stalling]]

Also bug 27672 was filed yesterday.
Comment 12 Brad Jorsch 2011-02-24 17:38:31 UTC
Several problems on enwiki were caused by the difference in Sanitize::escapeId between HTML4 and HTML5 modes.

<ref name="foo"> tries to generate a link in the page something like [[#cite_note-foo|[1]]]. In HTML4 mode <ref name="foo [[bar]]"> generates [[#cite_note-foo_.5B.5Bbar.5D.5D-0|[1]]] which functions correctly, but in HTML5 mode it generates [[#cite_note-foo_[[bar]]-0|[1]]] which of course breaks horribly.

Also, in HTML4 mode <ref name="foo">, <ref name="_foo_">, and <ref name="''foo''"> are all distinct. In HTML5 mode these are all considered equivalent.
Comment 13 Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-02-24 18:12:08 UTC
(In reply to comment #11)
> You explained exactly the same error with scraping in 2009:
> [[Wikipedia:Village pump (technical)/Archive 67#Twinkle stalling]]
> 
> Also bug 27672 was filed yesterday.

This suggests maybe some named entities have crept through, or some other type of well-formedness.  It would be nice if people said which exact pages failed, but it would probably be possible to figure it out.  I'm guessing it's the result of messages being passed as raw HTML and sysops adding named entities to them, but it could be something else too.

The easy way out would be to restore the old hack where we serve HTML5 with an HTML 4.01 Strict doctype, which is valid HTML5 but rather confusing.  This is how 1.16 works by default.  That way a DTD is specified, which means that non-browser UAs will parse named entities successfully.  We can consider switching back to the HTML5 doctype later.

(In reply to comment #12)
> Several problems on enwiki were caused by the difference in Sanitize::escapeId
> between HTML4 and HTML5 modes.

Hmm.  This should be disable-able by setting $wgExperimentalHtmlIds to false, leaving $wgHtml5 true (which might leave well-formedness issues).  A proper fix will require some more thought, though.  The changes to escapeId() are really meant for headings, but we can't realistically distinguish wikilinks meant to point at headings from wikilinks meant to point at other things.

In practice, it looks like Cite is the major problem here (with the id's), and it can probably be fixed.  My first inclination is to just generate arbitrary id's for named refs instead of trying to key off the names.
Comment 14 Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-02-24 18:12:50 UTC
For reference, from #wikimedia-tech (contains some additional links that should be checked when fixing):

[110224 11:45:22] <RoanKattouw> AryehGregor: https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:Vpt#More_than_Twinkle_is_broken , https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:Vpt#Merged_Reflinks , https://secure.wikimedia.org/wikipedia/en/wiki/Wikipedia:Vpt#Recent_Javascript_changes
[110224 11:46:53] <RoanKattouw_away> AryehGregor: Due to those issues, HTML 5 was switched back off today
Comment 15 CBM 2011-06-24 01:15:30 UTC
This is blocking some work for a GSOC project to improve the article assessment system on enwiki. If there is nothing blocking any longer, it would be nice to see wgHtml5 re-enabled. I'm just commenting here to bump the bug.
Comment 16 MZMcBride 2011-06-24 04:11:03 UTC
(In reply to comment #15)
> This is blocking some work for a GSOC project to improve the article assessment
> system on enwiki. If there is nothing blocking any longer, it would be nice to
> see wgHtml5 re-enabled. I'm just commenting here to bump the bug.

The best summary of the remaining issues (and a path for re-deployment) can be found here: <http://lists.wikimedia.org/pipermail/wikitech-l/2011-June/053775.html>.

Really this just needs a sysadmin with the time and patience to shepherd this through. Mark (of Bugmeister fame) should probably assign this to someone.
Comment 17 Mark A. Hershberger 2011-06-24 21:34:53 UTC
(In reply to comment #16)
> Really this just needs a sysadmin with the time and patience to shepherd this
> through. Mark (of Bugmeister fame) should probably assign this to someone.

Message heard! Action being taken!
Comment 18 Roan Kattouw 2011-06-29 14:38:15 UTC
(In reply to comment #16)
> The best summary of the remaining issues (and a path for re-deployment) can be
> found here:
> <http://lists.wikimedia.org/pipermail/wikitech-l/2011-June/053775.html>.
>
Some clarifications on the deployment plan, after talking to Aryeh on IRC:

Stage 1: Set the doctype to HTML 4.01 strict. This is done by setting $wgDocType = '-//W3C//DTD HTML 4.01//EN'; and $wgDTD = 'http://www.w3.org/TR/html4/strict.dtd'; . Per Aryeh's post this should only cause minor layout issues (category 1 in Aryeh's post).

Stage 2: Once any issues from stage 1 are fixed, set an HTML5 doctype without enabling $wgHtml5. Because the doctype tag is structured differently, you can't use $wgDocType / $wgDTD but you have to live hack it in. In Html::htmlHeader(), change the if ( $wgHtml5 ) test to something like if ( $wgHtml5 || $wgSomethingElse ) or if ( $wgHtml5 || true ) or whatever you like. This may and probably will lead to category 2 breakage.

Stage 3: Once that's working, actually set $wgHtml5 = true; . Category 3 breakage possible.

Once everything has been running smoothly for a couple of days, take out the live hack and the $wgDocType / $wgDTD settings.
Comment 19 Arthur Richards 2011-08-17 23:53:04 UTC
Is there an ETA for this?
Comment 20 Sam Reed (reedy) 2011-08-17 23:55:01 UTC
(In reply to comment #19)
> Is there an ETA for this?

Ideally we want to get this somewhat done before Aryeh is on long term leave (from the internet).

I keep finding enough other stuff to do, so haven't got round to it. Hopefully when I get the Metrics stuff out of the way.

Is it blocking you for something?
Comment 21 Arthur Richards 2011-08-18 00:07:22 UTC
(In reply to comment #20) 
> Is it blocking you for something?

It's blocking my GSoC mentee (Yuvi - porting WP1.0 bot to Mediawiki extension).  We're a few weeks away from being ready for deployment on the cluster, so I'm hoping to get a sense of when this'll be resolved for planning/etc.  And to make sure this is still on the radar :p

Thanks Reedy!
Comment 22 Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-08-18 16:42:27 UTC
It turns out I'll likely be available to give advice for longer than I thought, at least until mid-October and possibly for months beyond that.
Comment 23 Sam Reed (reedy) 2011-08-18 16:58:32 UTC
(In reply to comment #22)
> It turns out I'll likely be available to give advice for longer than I thought,
> at least until mid-October and possibly for months beyond that.

That's useful to know, but shouldn't leave it to the end either.

I'll see about getting it bumped up the priority list in the near future
Comment 24 p858snake 2011-08-20 02:32:48 UTC
could we perhaps enable this on mw wiki to start testing on a more "content"ish wiki compared to just test and test2?
Comment 25 Sam Reed (reedy) 2011-08-22 16:07:27 UTC
(In reply to comment #24)
> could we perhaps enable this on mw wiki to start testing on a more "content"ish
> wiki compared to just test and test2?

We could... I'm not sure if we need to go the effort of 101 mailing list posts, or it's just a JFDI and deal with the issues as they come up
Comment 26 Sam Reed (reedy) 2011-08-22 17:18:54 UTC
(In reply to comment #25)
> (In reply to comment #24)
> > could we perhaps enable this on mw wiki to start testing on a more "content"ish
> > wiki compared to just test and test2?
> 
> We could... I'm not sure if we need to go the effort of 101 mailing list posts,
> or it's just a JFDI and deal with the issues as they come up

Done.

Step 1 complete
Comment 27 Ryan Kaldari 2011-11-16 08:10:19 UTC
BTW, I did an informal inventory of display issues caused by switching mediaWiki to HTML5 (due to the rendering mode change from semi-quirks to strict in some browsers). The only issue I saw that was noticeable was the placement of the magnifying glass button in the search field (It displays slightly lower on HTML5 wikis). I imagine other display issues will be present, but that's the only one I could find (testing on 1.17 and 1.18).
Comment 28 Phillip Patriakeas 2011-11-16 09:19:57 UTC
(In reply to comment #27)
> BTW, I did an informal inventory of display issues caused by switching
> mediaWiki to HTML5 (due to the rendering mode change from semi-quirks to strict
> in some browsers). The only issue I saw that was noticeable was the placement
> of the magnifying glass button in the search field (It displays slightly lower
> on HTML5 wikis). I imagine other display issues will be present, but that's the
> only one I could find (testing on 1.17 and 1.18).

That sounds like bug 32025 (took me forever to find that >_> )...
Comment 29 Ryan Kaldari 2011-11-16 18:27:28 UTC
Actually, it's bug 30525. I've added it as a dependency.
Comment 30 Derk-Jan Hartman 2012-01-09 22:06:34 UTC
Can we test this on labs perhaps, like straight after the 1.19 deploy perhaps ?
Comment 31 Sam Reed (reedy) 2012-01-09 22:35:12 UTC
it's enabled on mediawiki.org noww
Comment 32 Yuvi Panda 2012-05-11 07:09:46 UTC
Why is this scheduled for the mysterious future?
Comment 33 Krinkle 2012-05-11 08:36:31 UTC
Well, basically when a bug is accepted and is agreed to be executed it gets a milestone. This one was scheduled for 1.18wmf1 deployment but that didn't happen.

Since then a blocking bug was added, which hasn't been fixed yet. So whatever milestone set, will be useless since the blocking bug needs to be fixed first.

"Mysterious future" basically means "1.(next) release". Which is postponed until it can be done. But it has been assigned to be in a release. Just not sure yet which one, depending on the dependencies.
Comment 34 p858snake 2012-05-11 08:43:07 UTC
(In reply to comment #33)
> Since then a blocking bug was added, which hasn't been fixed yet. So whatever
> milestone set, will be useless since the blocking bug needs to be fixed first.

Read the bug, It doesn't really block this getting done. In fact this will kinda fix it (if we are both talking about 34475).


We should just get around to setting a date and give warning then flipping the switch, Tool authors have been warned several times already that this will be happening and if they still haven't updated, well…
Comment 35 Rob Lanphier 2012-06-28 21:18:50 UTC
Discussion started on wikitech-l:
See http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/62146

If there are no serious objections, we'll set a date for sometime in July.  Exact date will be set here: http://wikitech.wikimedia.org/view/Software_deployments , and should get reflected back to this bug.
Comment 36 MZMcBride 2012-06-30 01:11:13 UTC
(In reply to comment #12)
> Several problems on enwiki were caused by the difference in Sanitize::escapeId
> between HTML4 and HTML5 modes.
> 
> <ref name="foo"> tries to generate a link in the page something like
> [[#cite_note-foo|[1]]]. In HTML4 mode <ref name="foo [[bar]]"> generates
> [[#cite_note-foo_.5B.5Bbar.5D.5D-0|[1]]] which functions correctly, but in
> HTML5 mode it generates [[#cite_note-foo_[[bar]]-0|[1]]] which of course breaks
> horribly.

I believe this is bug 27694.

> Also, in HTML4 mode <ref name="foo">, <ref name="_foo_">, and <ref
> name="''foo''"> are all distinct. In HTML5 mode these are all considered
> equivalent.

I'm not sure this is still an issue. Can you take a look at <https://test.wikipedia.org/wiki/Cite_anchor_equivalency> and confirm?

Are there any other known Cite-related issues?
Comment 38 Brad Jorsch 2012-07-01 04:53:41 UTC
(In reply to comment #36)
> (In reply to comment #12)
> 
> I believe this is bug 27694.

It appears someone copied it from an enwiki post based on comment #12 to a new bug.

> > Also, in HTML4 mode <ref name="foo">, <ref name="_foo_">, and <ref
> > name="''foo''"> are all distinct. In HTML5 mode these are all considered
> > equivalent.
> 
> I'm not sure this is still an issue. Can you take a look at
> <https://test.wikipedia.org/wiki/Cite_anchor_equivalency> and confirm?

As pointed out in comment #13, it only occurs if $wgExperimentalHtmlIds is true. Does test.wikipedia.org have this true or false?
Comment 39 Alex Monk 2012-08-01 23:25:36 UTC
(In reply to comment #38)
> As pointed out in comment #13, it only occurs if $wgExperimentalHtmlIds is
> true. Does test.wikipedia.org have this true or false?

It's not mentioned in the config. The default is false.

It's August now, and there does not appear to be any serious objections. Is this going to be deployed soon?
Comment 40 Robin Pepermans (SPQRobin) 2012-09-03 19:24:25 UTC
(In reply to comment #39)
> It's August now, and there does not appear to be any serious objections. Is
> this going to be deployed soon?

*bump* And it's September now :) If there are still concerns or possible problems, maybe first enable it on smaller wikis?
Comment 41 Sam Reed (reedy) 2012-09-03 19:28:26 UTC
(In reply to comment #40)
> (In reply to comment #39)
> > It's August now, and there does not appear to be any serious objections. Is
> > this going to be deployed soon?
> 
> *bump* And it's September now :) If there are still concerns or possible
> problems, maybe first enable it on smaller wikis?

I don't think there's anything stopping this... If you want it enabling on some of the smaller wikis that you can use (and as such possibly help them if they have problems), I don't mind enabling it more widely too




reedy@fenari:/home/wikipedia/common$ mwscript eval.php testwiki
> var_dump( $wgExperimentalHtmlIds );
bool(false)


^ I suppose I should set that to true on testwiki, test2wiki and mediawikiwiki for starters


Though:

/**
 * Should we allow a broader set of characters in id attributes, per HTML5?  If
 * not, use only HTML 4-compatible IDs.  This option is for testing -- when the
 * functionality is ready, it will be on by default with no option.
 *
 * Currently this appears to work fine in all browsers, but it's disabled by
 * default because it normalizes id's a bit too aggressively, breaking preexisting
 * content (particularly Cite).  See bug 27733, bug 27694, bug 27474.
 */
$wgExperimentalHtmlIds = false;
Comment 42 Derk-Jan Hartman 2012-09-03 19:34:48 UTC
Should remain false per https://bugzilla.wikimedia.org/show_bug.cgi?id=27694#c4
Comment 43 MZMcBride 2012-09-03 19:41:28 UTC
(In reply to comment #42)
> Should remain false per https://bugzilla.wikimedia.org/show_bug.cgi?id=27694#c4

To be clear, you mean that $wgExperimentalHtmlIds should remain false, not $wgHtml5. :-)

There's a page here for software deployments: <http://wikitech.wikimedia.org/view/Software_deployments>. I suppose this qualifies. How does this variable ($wgHtml5) get scheduled for a roll-out deployment?
Comment 44 Richard Guk 2012-09-05 22:43:16 UTC
Will [[mw:Manual:$wgWellFormedXml]] remain set to true?
Comment 45 Derk-Jan Hartman 2012-09-05 23:04:04 UTC
" we're scheduling a deployment of HTML5 across the Wikimedia cluster [1]. This is set for Monday
17th September at 18:00-20:00 UTC [2]."

http://lists.wikimedia.org/pipermail/wikitech-l/2012-September/063112.html

@Richard, as far as I know there has been no talk of changing the $wgWellFormedXml configuration. Again however, everything relying on that, probably shouldn't :D
Comment 46 Derk-Jan Hartman 2012-09-17 19:05:16 UTC
This has now been done. http://lists.wikimedia.org/pipermail/wikitech-l/2012-September/063249.html

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links