Last modified: 2005-12-20 11:31:53 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 4211 - HTML is a mess
HTML is a mess
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
All All
: Normal minor (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
  Show dependency treegraph
Reported: 2005-12-08 01:40 UTC by Brian Jason Drake
Modified: 2005-12-20 11:31 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---

Test script (211 bytes, text/plain)
2005-12-09 08:23 UTC, Ævar Arnfjörð Bjarmason

Description Brian Jason Drake 2005-12-08 01:40:42 UTC
I don't think this requires any explanation. There should be two ways of viewing HTML:

* Normal: remove all unnecessary characters to minimize download time
* Special: format the HTML neatly (e.g. "</head>" is on its own line and directly 
below "<head>") for those who want to examine it

Alternatively we could just use the first option and those who want to examine the 
HTML can use another program to clean it up, or do so manually.
Comment 1 Brion Vibber 2005-12-08 01:44:03 UTC
Try to have bug reports relate to MediaWiki in some way, rather than general comments
about a standard markup language.
Comment 2 Brian Jason Drake 2005-12-08 02:42:26 UTC
My comments are relevant generally but I am thinking here about the specific HTML 
outputted by MediaWiki, and this is the only place to fix it. Therefore, my report 
does relate to MediaWiki.
Comment 3 Brion Vibber 2005-12-08 02:51:03 UTC
Well, if you want to pretty up the HTML you can copy-and-paste it to any text editor with XML/HTML 
prettification functions. So... seems done?
Comment 4 Brian Jason Drake 2005-12-08 02:58:56 UTC
I did acknowledge this in my original comment. You did not address the other 
statement I made: we should compact the HTML to make it smaller and load faster.
Comment 5 Rob Church 2005-12-08 07:50:38 UTC
The XHTML standard doesn't require that markup is formatted to be human-readable.
Comment 6 Ævar Arnfjörð Bjarmason 2005-12-08 10:47:01 UTC
$ perl -MLWP::Simple -le '$c = get
"http://localhost/mw/HEAD/wiki/Albert_Einstein"; (@c) = $c =~ /\n/g;print for
length $c, scalar @c, ((scalar @c)/(length $c))*100 . "%"'

Stripping newlines would result in approximately 1.5% bandwidth saving in the
XHTML for each page acc. to my tests.
Comment 7 Brion Vibber 2005-12-08 17:25:08 UTC
Did you test with or without gzip encoding?
Comment 8 Ævar Arnfjörð Bjarmason 2005-12-08 17:28:11 UTC
(In reply to comment #7)
> Did you test with or without gzip encoding?

Assuming that it's ungzipped...
Comment 9 Brian Jason Drake 2005-12-09 01:14:38 UTC
What's gzip encoding got to do with this? What do we think about the 1.5% figure?
Comment 10 Rob Church 2005-12-09 07:52:24 UTC
(In reply to comment #9)
> What's gzip encoding got to do with this? What do we think about the 1.5% figure?

It has a lot to do with determining whether or not the performance gain is worth
it, considering our various caching and compression systems. Obviously, we like
the figure if it's worth it...
Comment 11 Ævar Arnfjörð Bjarmason 2005-12-09 08:23:16 UTC
Created attachment 1160 [details]
Test script

Seems to be around 0.5% space saving if gzip is accounted for
Comment 12 Brian Jason Drake 2005-12-20 05:29:57 UTC
As I understand it, gzip is not used over the Internet in this case but other 
compression may be used. Is this the case? Do all the Wikimedia systems use gzip 
Comment 13 Brion Vibber 2005-12-20 11:31:53 UTC
You understand incorrectly.

Note You need to log in before you can comment on or make changes to this bug.