Last modified: 2014-09-23 23:35:45 UTC
Is it possible / worthwhile? With some very large SVGs being uploaded and the possibility of client-side SVG rendering - it might be a good idea. Due to XMLs repetitive nature, compression is usually fairly high.
I second this idea. It would make more efficient use of space on Wikimedia's servers.
I created a patch, attached. Please post feedback here or en:User_talk:Brownsteve INSTRUCTIONS TO USE THE SVGZ PATCH 1. Install/upgrade MediaWiki - latest version is best 2. Install ImageMagick 6.2 or later. This is required for SVGZ support. You may also need to install librsvg / rsvg. 3. Apply this patch (if the patch has been accepted, SKIP THIS STEP) cd /path/to/mediawiki patch -p0 svgz.patch 4. In LocalSettings.php, add these lines: $wgStrictFileExtensions=false; $wgEnableUploads = true; 5. You MUST enable your web server to properly serve SVGZ files with Content-Encoding:gzip and MIME type image/svg+xml. This is NOT enabled by default. Edit httpd.conf or .htaccess (search Google if stuck) EXAMPLE: In Apache's httpd.conf, add these lines: AddType image/svg+xml .svg .svgz AddEncoding gzip .svgz <Files *.svgz.*> RemoveEncoding .svgz </Files> 6. Restart your web server (on Linux, sudo /etc/init.d/httpd restart) 7. That's it! Have fun. Report bugs/problems/etc. to en:User_talk:Brownsteve BUGS IN THIS PATCH!!!! SVG is not a registered MIME type with IANA. Apache only includes IANA-approved MIME types by default. You must configure your httpd.conf by hand until this is fixed. We assume all browsers that can render SVG graphics can also Accept-Encoding:gzip.
Created attachment 2817 [details] Patch to enable SVGZ support
Created attachment 2818 [details] Instructions to apply the patch
Comment on attachment 2817 [details] Patch to enable SVGZ support Index: maintenance/FiveUpgrade.inc =================================================================== --- maintenance/FiveUpgrade.inc (revision 18367) +++ maintenance/FiveUpgrade.inc (working copy) @@ -714,7 +714,7 @@ # Height and width $gis = false; - if( $mime == 'image/svg' ) { + if( $mime == 'image/svg' || $mime == 'image/svg+xml' ) { $gis = wfGetSVGsize( $filename ); } elseif( $magic->isPHPImageType( $mime ) ) { $gis = getimagesize( $filename ); Index: includes/MimeMagic.php =================================================================== --- includes/MimeMagic.php (revision 18367) +++ includes/MimeMagic.php (working copy) @@ -23,7 +23,7 @@ image/gif gif image/jpeg jpeg jpg jpe image/png png -image/svg+xml svg +image/svg+xml svg svgz image/tiff tiff tif image/vnd.djvu djvu text/plain txt @@ -51,7 +51,8 @@ image/gif [BITMAP] image/jpeg [BITMAP] image/png [BITMAP] -image/svg image/svg+xml [DRAWING] +image/svg+xml [DRAWING] +image/svg [DRAWING] image/tiff [BITMAP] image/vnd.djvu [BITMAP] text/plain [TEXT] @@ -368,6 +369,26 @@ $mime = "application/x-msmetafile"; } + if (substr($head,0,2) == "\x1f\x8b" && preg_match('/\.svgz$/si', $file) && $f = gzopen($file, "rt") ) { + //GZip Magic Signature; probably a GZipped SVG file (.svgz) + + //The host web server software is responsible to output + //the proper "Content-Encoding: gzip" HTTP header for .svgz + // True SVGZ images without the ".svgz" extension WILL fail because + // only this extension triggers the server's correct encoding header. + // Example: In Apache httpd.conf add: + // AddType image/svg+xml .svg .svgz + // AddEncoding gzip .svgz + // <Files *.svgz.*> + // RemoveEncoding .svgz + // </Files> + $chunk = gzread( $f, 4096 ); + gzclose( $f ); + + //look for svg tag + if( preg_match( '/<svg\s*([^>]*)\s*>/s', $chunk ) ) $mime = "image/svg+xml"; + } + if (strpos($mime,"text/")===0 || $mime==="application/xml") { $xml_type= NULL; @@ -399,8 +420,8 @@ #print "<br>ANALYSING $file ($mime): doctype= $doctype; tag= $tag<br>"; - if (strpos($doctype,"-//W3C//DTD SVG")===0) $mime= "image/svg"; - elseif ($tag==="svg") $mime= "image/svg"; + if (strpos($doctype,"-//W3C//DTD SVG")===0) $mime= "image/svg+xml"; + elseif ($tag==="svg") $mime= "image/svg+xml"; elseif (strpos($doctype,"-//W3C//DTD XHTML")===0) $mime= "text/html"; elseif ($tag==="html") $mime= "text/html"; } Index: includes/Image.php =================================================================== --- includes/Image.php (revision 18367) +++ includes/Image.php (working copy) @@ -272,7 +272,7 @@ # Height and width wfSuppressWarnings(); - if( $this->mime == 'image/svg' ) { + if( $this->mime == 'image/svg' || $this->mime == 'image/svg+xml' ) { $gis = wfGetSVGsize( $this->imagePath ); } elseif( $this->mime == 'image/vnd.djvu' ) { $deja = new DjVuImage( $this->imagePath ); @@ -619,7 +619,7 @@ if (!$mime || $mime==='unknown' || $mime==='unknown/unknown') return false; #if it's SVG, check if there's a converter enabled - if ($mime === 'image/svg') { + if ($mime === 'image/svg' || $mime === 'image/svg+xml') { global $wgSVGConverters, $wgSVGConverter; if ($wgSVGConverter && isset( $wgSVGConverters[$wgSVGConverter])) { @@ -1150,8 +1150,7 @@ $err = false; $cmd = ""; $retval = 0; - - if( $this->mime === "image/svg" ) { + if( $this->mime === "image/svg" || $this->mime === "image/svg+xml" ) { #Right now we have only SVG global $wgSVGConverters, $wgSVGConverter; Index: includes/ImageFunctions.php =================================================================== --- includes/ImageFunctions.php (revision 18367) +++ includes/ImageFunctions.php (working copy) @@ -139,7 +139,6 @@ /** * Compatible with PHP getimagesize() - * @todo support gzipped SVGZ * @todo check XML more carefully * @todo sensible defaults * @@ -156,6 +155,15 @@ $chunk = fread( $f, 4096 ); fclose( $f ); + if (substr($chunk,0,2) == "\x1f\x8b" && preg_match('/\.svgz$/si', $filename)) { + //it's compressed; decompress it + $f = gzopen( $filename, "rt" ); + if ( !$f ) return false; + + $chunk = gzread( $f, 4096 ); + gzclose( $f ); + } + // Uber-crappy hack! Run through a real XML parser. $matches = array(); if( !preg_match( '/<svg\s*([^>]*)\s*>/s', $chunk, $matches ) ) {
Created attachment 2881 [details] A better patch using zlib
Created attachment 2882 [details] Instructions to apply the patch (newer)
(In reply to comment #6) > Created an attachment (id=2881) [edit] > A better patch using zlib > * thinks like changing image/svg to image/svg+xml hasn't anything to do with svgz, should be applied to own bug * preg_match('/\.svgz$/si', $filename), a bit unnecessary using preg_match here.
The mimetype conversion is required so as not to confuse Apache. It just works around another bug 7554, which has been sitting untouched for a while.
Created attachment 5578 [details] Patch to enable SVGZ support - against r44559 Updated patch against r44559 (2008-12-13). I believe this should satisfy Carl's concerns, now that bug 7554 is resolved. Please note Apache2 will still need configuration with these lines (see Comment #3 above): AddType image/svg+xml .svg .svgz AddEncoding gzip .svgz <Files *.svgz.*> RemoveEncoding .svgz </Files>
Created attachment 5579 [details] Check if PHP was compiled with ZLib support Also check function_exists( 'gzopen' ) per ^demon on #mediawiki.
Just tried it out. I see two issues so far. The bigger one is that thumbnailing doesn't seem to work. The other problem is file extensions: it seems I can upload gzipped SVG as either .svg or .svgz, but not as .svg.gz (which is what I originally tried). I'm not sure what the most reasonable thing to do here would be: I'd think either gzipped SVG should only be allowed as .svgz (or .svg.gz), or we should just treat normal and gzipped SVG as identical, and probably automatically rename all three suffixes to just .svg (and maybe even go ahead and automatically gzip any SVG files not already uploaded that way). I'm mostly inclined towards the latter option, if only because it seems silly to hardcode such a trivial difference into the file page title.
(In reply to comment #12) > Just tried it out. I see two issues so far. The bigger one is that > thumbnailing doesn't seem to work. The other problem is file extensions: it > seems I can upload gzipped SVG as either .svg or .svgz, but not as .svg.gz > (which is what I originally tried). I'm not sure what the most reasonable > thing to do here would be What we do for .jpeg, .jpg, .JPEG, .JPG, etc. is just store the extensions differently despite there being no difference in the file type. :) Which is the bug for "don't make file extension part of file name"? I don't see why users should decide whether to gzip SVG at upload time, though. Surely it should just be transparently compressed as it's served to the user, like with styles/scripts? .svgz and .svg.gz could then be accepted as aliases for .svg on upload, and the files could be decompressed for storage. Or compressed, or whatever, but consistently.
(In reply to Comment #12) thumbnailing "works for me." We must track this down, but I don't know what could be going on. It looks like you need ImageMagick >= 5.5.7 to thumbnail SVGZ images. Are you using a different $wgSVGConverter? When uploading images to my testbed, my $wgDebugLogFile shows this: > SvgHandler::rasterize: convert -background white -geometry 180 '/var/lib/mediawiki/images/c/c1/France.svgz' PNG:'/var/lib/mediawiki/images/thumb/c/c1/France.svgz/180px-France.svgz.png' 2>&1 > wfShellExec: convert -background white -geometry 180 '/var/lib/mediawiki/images/c/c1/France.svgz' PNG:'/var/lib/mediawiki/images/thumb/c/c1/France.svgz/180px-France.svgz.png' 2>&1 I don't think we should support the .svg.gz extension. Apache thinks this an archive mimetype. It then serves it up with the wrong headers and confuses Firefox, instead of decompressing inside the browser. And it might open the door to uploading any arbitrary archive file. (In reply to Comment #13) If SVGZ support is added, I imagine a "WikiProject Convert All Images to .svgz" might spring up, or someone might write a bot. We should just be consistent here and now, and avoid any wasted labor later on. So should the internal storage be SVG or SVGZ? I guess this is mostly a matter of opinion, so here are my thoughts: -HTML, CSS, etc. are documents, rapidly changing and volatile (this is a wiki, after all.) -It makes good sense to compress HTML at serve-time through Content-Encoding:gzip -Images, including vector graphics, are much more static -For efficiency's sake, we should compress images only once if possible. It doesn't make sense to recompress a SVG 6000 times on every serve (or even if it's cached.) We should only recompress when the file changes. -User data, e.g. Wikipedia, has wider scope than live web sites: dbdumps, cdroms, etc. We need to consider these use-cases as well. Permanent compression could be a major advantage here. -PNG, GIF, et al. all have compression features; this maximizes their usefulness and spread -Wikipedia is a major driving force behind SVG; it would help further popularize the format if we support SVGZ
SVGZ is pretty ugly to handle because it doesn't have its own Content-Type... the server is supposed to serve them out with Content-Type: image/svg+xml *and* Content-Encoding: gzip... which has the added confusion that the user-agent would transparently decompress it... and if you save it to disk you'll get the decompressed version. So, even if everything's working right on the server end, when you download you may not get back the same file you uploaded. Potentially now you've got an ".svgz" file on disk which is actually not compressed... Eww! As a trivial test, I gzipped an SVG file and uploaded it to my web server, running Apache 2 on Ubuntu 8.10: http://leuksman.com/misc/test2.svgz With a stock Apache configuration, it's served out as image/svg+xml *without* the encoding setting. Firefox 3.0.3 interprets it as raw SVG, which of course is invalid XML (being a big binary blob) and doesn't render it. After adding the Apache config bits above to add the Content-Encoding header, I find that Firefox renders it now. (Yay!) But, if I save the file to disk, it saves the *compressed* version with a ".svgz.svg" extension, which now fails to load since it's marked as uncompressed but is in fact compressed. Safari 3.2.1 and Opera 9.5 render the image fine inline, but when saving to disk give me an *uncompressed* version with ".svgz" extension. So... I think things are not quite mature enough here. :( IMHO the cleanest way to go would be to transparently decompress .svgz files on upload, normalize everything to .svg, and have the web server transparently gzip .svg files when serving out, if we like, to save bandwidth. (Most of the time we don't even serve the .svg out -- we serve a .png rasterization -- so this wouldn't be a heavy burden.)
(In reply to comment #14) > -For efficiency's sake, we should compress images only once if possible. It > doesn't make sense to recompress a SVG 6000 times on every serve (or even if > it's cached.) We should only recompress when the file changes. This will be the case in practice either way, pretty much. The HTTP response (with the compressed version of the file) should be cached indefinitely by Squid. Anyway, as Brion points out, we don't serve actual SVGs on page views and aren't going to start anytime soon: * Limited benefit without IE support, which last I checked looks to arrive in approximately 2027 if Microsoft doesn't invent a proprietary alternative in the intervening time. * Browser support needs to be as fast as rendering a bitmap, so there's no performance regression (this is far from the case right now with arbitrary SVGs). * It would be a pain to do this until we can assume all clients support SVG, because we would have to serve bitmaps to some users and SVG to others. Then we'd have problems with cache fragmentation and inconsistent appearance (depending on the features supported by various browsers vs. our SVG renderer). * Security! We would need an SVG sanitizer that we know is reliable, to avoid script injection and fun stuff like that. So the number of times SVG will actually be served to users is likely to be very low. > -User data, e.g. Wikipedia, has wider scope than live web sites: dbdumps, > cdroms, etc. We need to consider these use-cases as well. Permanent > compression could be a major advantage here. No image dumps exist now at all, do they? If they did, SVGs could be gzipped in the dump (or heavier compression could be used if convenient).
(In reply to comment #15) > SVGZ is pretty ugly to handle because it doesn't have its own Content-Type... > the server is supposed to serve them out with Content-Type: image/svg+xml *and* > Content-Encoding: gzip... which has the added confusion that the user-agent > would transparently decompress it... and if you save it to disk you'll get the > decompressed version. Transparently decompressing documents with Content-Encoding is a bug. http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.11 : | The content-coding is a characteristic of the entity identified by | the Request-URI. Typically, the entity-body is stored with this | encoding and is only decoded before rendering or analogous usage. However, support for HTTP/1.1-compliant transparent encoding via TE and Transfer-Encoding headers is still minimal, so Content-Encoding continues to be abused for that with no consistent user-agent behavior (depending on media type). Mozilla behavior for SVGZ depends on user choice between "Web Page, complete" (saves a mangled version of decompressed SVG) and "Web Page, SVG only" (saves original SVGZ, but suggests a filename with .svg appended). > IMHO the cleanest way to go would be to transparently decompress .svgz files on > upload, normalize everything to .svg, and have the web server transparently > gzip .svg files when serving out, if we like, to save bandwidth. That is the same as sending SVGZ with correct headers from the user-agent's point of view (until TE/Transfer-Encoding are supported), just more expensive on the server side.
(In reply to comment #17) > > IMHO the cleanest way to go would be to transparently decompress .svgz files on > > upload, normalize everything to .svg, and have the web server transparently > > gzip .svg files when serving out, if we like, to save bandwidth. > > That is the same as sending SVGZ with correct headers from the user-agent's > point of view (until TE/Transfer-Encoding are supported), just more expensive > on the server side. We wouldn't bother with the compression until/unless browser behavior is consistent, which it isn't right now. Keeping everything uncompressed server-side makes the potential transition much simpler if that day ever comes.
*Bulk BZ Change: +Patch to open bugs with patches attached that are missing the keyword*
Created attachment 8827 [details] VitaliyFilippov's patch, based on XMLTypeCheck Hi all! I've made my own patch for this. It's simpler and based on XMLTypeCheck, it fully supports SVGZ, and also correctly detects gzipped Dia diagrams. Review it please.
Adding myself to CC.
Hmm, it kinda looks like that won't be able to distinguish between a gzipped file with .svg extension (wrong) and a gzipped file with .svgz extension (right), or an uncompressed file with .svgz extension (wrong) and an uncompressed file with .svg extension (right). It also doesn't look like the SvgMetadataExtractor class will automatically pick up compression -- it uses XMLReader directly -- so we won't be able to extract width/height and any generic metadata that may be in the file. If we don't have a size, we can't render it.
Yeah, but there's also no separate mime type for SVGZ... Width/height extraction works at least in 1.16... It looks like it also uses XmlTypeCheck... Is it changed in trunk?
Yep, that changed back in 1.17. Always make and test patches against trunk to make sure you're working with current code.
Vitaliy, thanks for your patch. Do you have time to revise it so it works against trunk?
Created attachment 9470 [details] Updated patch for r103314 (VitaliyFilippov) Updated the patch. Is it OK to use stream wrappers for SVGMetadataExtractor? (I mean compress.zlib://)
Hmm, looks like it ought to work (haven't tested just yet). Looks like SvgMegadataExtractor ought to work, though I'm uncertain about that file subset cutoff thing. From my previous comments in comment 22 it looks like the extension issue still stands: there doesn't appear to be logic to ensure that '.svg' files are uncompressed and '.svgz' files are compressed. Additionally it may be more likely for .svgz files to be misconfigured on the server, or possibly served out incorrectly via streaming (eg when using img_auth.php on private sites, or fetching images from the image stash API or Special:Undelete) -- IIRC correct way to serve an .svgz is as: Content-Type: image/svg+xml Content-Encoding: gzip if we only record that the file is of type image/svg+xml but don't track that gzipiness, we'll be serving the gzip data and clients won't handle it right. I think my preferred handling for .svgz would be to transparently decompress them on upload and rename them into .svg files... :P (.dia doesn't have this problem as there's not a separate extension or special HTTP header configuration for compressed files!)
On the streams -- main thing to check is what behavior you get if zlib support is not enabled in PHP. If it still works with uncompressed files, then fine -- if not then it should only kick in the compress.zlib: or gzopen if it knows they will work.
Testing the patch on current trunk... does not work for me. My LocalSettings.php contains: $wgFileExtensions[] = 'svg'; $wgFileExtensions[] = 'svgz'; Selecting a gzipped SVG image I've saved as .svgz lets me go through initial upload, but kicks me back with this error: "File extension ".svgz" does not match the detected MIME type of the file (image/svg+xml)." Renaming the same gzipped file to ".svg" extension (incorrect) allows me to upload it. It rasterizes via ImageMagick, but if I load the image directly into Firefox I get an XML error because the file is actually binary gzip data: XML Parsing Error: not well-formed Location: http://stormcloud.local/trunk/images/e/ef/Wrong.svg Line Number 1, Column 1:\uffff Renaming the uncompressed original to ".svgz" extension (also incorrect) fails like the real .svgz file did with: File extension ".svgz" does not match the detected MIME type of the file (image/svg+xml).
(In reply to comment #26) > Created attachment 9470 [details] > Updated patch for r103314 (VitaliyFilippov) > > Updated the patch. > Is it OK to use stream wrappers for SVGMetadataExtractor? > (I mean compress.zlib://) I had written an answer, which somehow isn't here :S Trying to summarise: * Noching against stream wrappers usage. * We should be able to work with normal svg even without zlib extension. * You are using fread() on a gzopen() handle. Which works, but is an undocumented feature. * Another option would be to use gzopen everywhere, and not check gzippiness. * The ( $size > $wgSVGMetadataCutoff ) check can be fooled by the compression. This could be extracted from the header in some cases, but the comment about a fake File instance being passed doesn't give me confidence.
(In reply to comment #27) > (.dia doesn't have this problem as there's not a separate extension or special > HTTP header configuration for compressed files!) Inkscape also doesn't make any difference between compressed and uncompressed SVG images. It opens uncompressed *.svgz and compressed *.svg without any problem :)
That isa quirk of Inkscape that you cannot rely on.
Can we at least enable gzip Transfer-Encoding for plain SVG files?
(In reply to comment #33) > Can we at least enable gzip Transfer-Encoding for plain SVG files? That would be very logical yes! I've filed bug 54291 in the Wikimedia servers component for configuring such a thing... in theory if configured on the server it'll be more transparent to users than explicit .svgz saving.