Last modified: 2005-12-20 11:31:53 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T6211, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 4211 - HTML is a mess
HTML is a mess
Status: RESOLVED INVALID
Product: MediaWiki
Classification: Unclassified
Parser (Other open bugs)
unspecified
All All
: Normal minor (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2005-12-08 01:40 UTC by Brian Jason Drake
Modified: 2005-12-20 11:31 UTC (History)
0 users

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Test script (211 bytes, text/plain)
2005-12-09 08:23 UTC, Ævar Arnfjörð Bjarmason
Details

Description Brian Jason Drake 2005-12-08 01:40:42 UTC
I don't think this requires any explanation. There should be two ways of viewing HTML:

* Normal: remove all unnecessary characters to minimize download time
* Special: format the HTML neatly (e.g. "</head>" is on its own line and directly 
below "<head>") for those who want to examine it

Alternatively we could just use the first option and those who want to examine the 
HTML can use another program to clean it up, or do so manually.
Comment 1 Brion Vibber 2005-12-08 01:44:03 UTC
Try to have bug reports relate to MediaWiki in some way, rather than general comments
about a standard markup language.
Comment 2 Brian Jason Drake 2005-12-08 02:42:26 UTC
My comments are relevant generally but I am thinking here about the specific HTML 
outputted by MediaWiki, and this is the only place to fix it. Therefore, my report 
does relate to MediaWiki.
Comment 3 Brion Vibber 2005-12-08 02:51:03 UTC
Well, if you want to pretty up the HTML you can copy-and-paste it to any text editor with XML/HTML 
prettification functions. So... seems done?
Comment 4 Brian Jason Drake 2005-12-08 02:58:56 UTC
I did acknowledge this in my original comment. You did not address the other 
statement I made: we should compact the HTML to make it smaller and load faster.
Comment 5 Rob Church 2005-12-08 07:50:38 UTC
The XHTML standard doesn't require that markup is formatted to be human-readable.
Comment 6 Ævar Arnfjörð Bjarmason 2005-12-08 10:47:01 UTC
$ perl -MLWP::Simple -le '$c = get
"http://localhost/mw/HEAD/wiki/Albert_Einstein"; (@c) = $c =~ /\n/g;print for
length $c, scalar @c, ((scalar @c)/(length $c))*100 . "%"'
124961
1717
1.37402869695345%

Stripping newlines would result in approximately 1.5% bandwidth saving in the
XHTML for each page acc. to my tests.
Comment 7 Brion Vibber 2005-12-08 17:25:08 UTC
Did you test with or without gzip encoding?
Comment 8 Ævar Arnfjörð Bjarmason 2005-12-08 17:28:11 UTC
(In reply to comment #7)
> Did you test with or without gzip encoding?

Assuming that it's ungzipped...
Comment 9 Brian Jason Drake 2005-12-09 01:14:38 UTC
What's gzip encoding got to do with this? What do we think about the 1.5% figure?
Comment 10 Rob Church 2005-12-09 07:52:24 UTC
(In reply to comment #9)
> What's gzip encoding got to do with this? What do we think about the 1.5% figure?

It has a lot to do with determining whether or not the performance gain is worth
it, considering our various caching and compression systems. Obviously, we like
the figure if it's worth it...
Comment 11 Ævar Arnfjörð Bjarmason 2005-12-09 08:23:16 UTC
Created attachment 1160 [details]
Test script

Seems to be around 0.5% space saving if gzip is accounted for
Comment 12 Brian Jason Drake 2005-12-20 05:29:57 UTC
As I understand it, gzip is not used over the Internet in this case but other 
compression may be used. Is this the case? Do all the Wikimedia systems use gzip 
only?
Comment 13 Brion Vibber 2005-12-20 11:31:53 UTC
You understand incorrectly.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links