Last modified: 2008-12-09 00:27:06 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T16268, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 14268 - SVG handler uses an "uber-crappy hack" instead of a real XML parser
SVG handler uses an "uber-crappy hack" instead of a real XML parser
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
File management (Other open bugs)
unspecified
All All
: Normal minor with 1 vote (vote)
: ---
Assigned To: Brion Vibber
: patch, patch-need-review
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-26 09:41 UTC by Mormegil
Modified: 2008-12-09 00:27 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Proposed patch (2.71 KB, patch)
2008-05-26 11:29 UTC, Max Semenik
Details
New patch (2.74 KB, patch)
2008-05-29 15:26 UTC, Max Semenik
Details
Updated patch (3.39 KB, patch)
2008-06-16 14:53 UTC, Chad H.
Details
Sample UTF-16 SVG (882.71 KB, image/svg+xml)
2008-12-09 00:23 UTC, Brion Vibber
Details

Description Mormegil 2008-05-26 09:41:58 UTC
wfGetSVGsize (in ImageFunctions.php) tries to detect the dimensions of a SVG file by looking for /<svg\s*([^>]*)\s*>/ in the first 4096 bytes of the file, acknowledging “Uber-crappy hack! Run through a real XML parser.” If it does not find the header by this method, it assumes the file is not a valid SVG file, and therefore, refuses to render its thumb on the image description page, replacing it with an icon.

This can break in many ways, especially with regards to XML comments: if the file starts with a big comment, the <svg> header might not fit into the first 4096 bytes of the file, or, if the comment contains <svg> it is mistaken for the real header.

Specifically, this happenned on http://commons.wikimedia.org/wiki/Image:Stub-icon_linguistics.svg (see version 16:57, 23. 5. 2008, marked 0×0) – the file begins with the full text of the GNU Free Documentation License, moving the SVG header outside the 4096 B limit.
Comment 1 Max Semenik 2008-05-26 11:29:28 UTC
Created attachment 4925 [details]
Proposed patch
Comment 2 Brion Vibber 2008-05-28 00:31:42 UTC
Looks good!

A couple quick tweaks -- the current patch version will give false positives to invalid files using upper- or mixed-case elements and attribute names. Use xml_parser_set_option() to disable case-folding; you can then check against the correct, lowercase element & attribute names.

It may also get confused by embedded <svg> elements (I'm not 100% sure that's legal or not but have the vague impression it may be); to be safe, disable the element handler once it's run for the first element.

See XmlTypeCheck.php for a somewhat similar bit of code... It may actually make some sense to merge them in some tricky way, say by having the type check object retain the attributes of the root element.
Comment 3 Max Semenik 2008-05-29 15:26:07 UTC
Created attachment 4935 [details]
New patch

(In reply to comment #2)
> Looks good!
> 
> A couple quick tweaks -- the current patch version will give false positives to
> invalid files using upper- or mixed-case elements and attribute names. Use
> xml_parser_set_option() to disable case-folding; you can then check against the
> correct, lowercase element & attribute names.

Yup, Opera, FF and Inkscape don't accept such deviations, my bad.

> It may also get confused by embedded <svg> elements (I'm not 100% sure that's
> legal or not but have the vague impression it may be); to be safe, disable the
> element handler once it's run for the first element.

Fixed in new patch.

> See XmlTypeCheck.php for a somewhat similar bit of code... It may actually make
> some sense to merge them in some tricky way, say by having the type check
> object retain the attributes of the root element.

Later, we may need to add more options to SvgInfo, so it shouldn't be forced by design to parse only root element.
Comment 4 Brion Vibber 2008-05-29 18:38:02 UTC
Couple remaining problems...

First, PHP spews notices due to use of undefined variables in wfGetSVGsize():

<b>Notice</b>:  Undefined variable: width in <b>/Library/WebServer/Documents/trunk/includes/ImageFunctions.php</b> on line <b>93</b><br />

<b>Notice</b>:  Undefined variable: height in <b>/Library/WebServer/Documents/trunk/includes/ImageFunctions.php</b> on line <b>93</b><br />

Be sure to test your code with error_reporting pegged up to at least E_ALL!


Second, it looks like the current code will incorrectly pass through a document whose root element isn't a <svg> at all, continuing on until it finds one later in the document. In most cases such a document won't reach this point (it should have gotten rejected on upload), but it would be better to work correctly.


Third, it doesn't appear to be doing any namespace checking, so will fail entirely on SVG files which use an XML namespace prefix on the root element.


Since XmlTypeCheck already handles those two cases, and there currently exists no other requirement which the SvgInfo class actually handles, my recommendation still stands to add a root attribute accessor to XmlTypeCheck and use that in place of the SvgInfo class. This would accomplish all required goals with less code and less code duplication, increasing reliability and maintainability of the code.
Comment 5 Brion Vibber 2008-05-29 18:44:20 UTC
And one last issue; it's not actually required for an SVG to specify width and height attributes, in which case "100%" is assumed for each when not provided.

The original code would default to a 256x256 pixel square; the patch returns 0x0, which won't provide sensible rendering.
Comment 6 Chad H. 2008-06-16 14:53:31 UTC
Created attachment 4990 [details]
Updated patch

Updated the old patch against the current trunk. Also added accessor methods getHeight() and getWidth() that return 100% if no $svg->height or $svg->width exist.
Comment 7 Siebrand Mazeland 2008-08-17 12:57:18 UTC
Assigned: to Brion for review.
Comment 8 Chad H. 2008-09-19 13:40:03 UTC
(In reply to comment #4)
> Since XmlTypeCheck already handles those two cases, and there currently exists
> no other requirement which the SvgInfo class actually handles, my
> recommendation still stands to add a root attribute accessor to XmlTypeCheck
> and use that in place of the SvgInfo class. This would accomplish all required
> goals with less code and less code duplication, increasing reliability and
> maintainability of the code.
> 

An accessor was added in r41024, so we're no longer accessing $rootElement directly (needed updates to MimeMagic too). One more step in the right direction.
Comment 9 Hk kng 2008-10-27 19:02:01 UTC
I'll add another case for you to consider. I can only follow part of your discussion, so maybe it's simply that I don't understand which code version you are running. Is it still the 'crappy hack'?
http://commons.wikimedia.org/w/index.php?title=Image:Arbeitslosenentwicklung_1933.svg
is encoded as UTF-16, and this currently leads to no PNGs being rendered.
Comment 10 Brion Vibber 2008-11-21 18:09:49 UTC
(In reply to comment #9)
> I'll add another case for you to consider. I can only follow part of your
> discussion, so maybe it's simply that I don't understand which code version you
> are running. Is it still the 'crappy hack'?
> http://commons.wikimedia.org/w/index.php?title=Image:Arbeitslosenentwicklung_1933.svg
> is encoded as UTF-16, and this currently leads to no PNGs being rendered.

Yeah, I think so -- the original checks would get confused by UTF-16 since it's not ASCII-compatible. A proper XML parse should work fine as long as the file's legit. (Unfortunately the referenced file has been deleted so I can't test on it.)

(In reply to comment #8)
> (In reply to comment #4)
> > Since XmlTypeCheck already handles those two cases, and there currently exists
> > no other requirement which the SvgInfo class actually handles, my
> > recommendation still stands to add a root attribute accessor to XmlTypeCheck
> > and use that in place of the SvgInfo class. This would accomplish all required
> > goals with less code and less code duplication, increasing reliability and
> > maintainability of the code.
> > 
> 
> An accessor was added in r41024, so we're no longer accessing $rootElement
> directly (needed updates to MimeMagic too). One more step in the right
> direction.

My recent changes to XmlTypeCheck to accept an element filter (used for more thorough scripting security checks on upload) should make it real easy to plug this into the existing class now.
Comment 11 Hk kng 2008-11-22 16:03:17 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > I'll add another case for you to consider. I can only follow part of your
> > discussion, so maybe it's simply that I don't understand which code version you
> > are running. Is it still the 'crappy hack'?
> > http://commons.wikimedia.org/w/index.php?title=Image:Arbeitslosenentwicklung_1933.svg
> > is encoded as UTF-16, and this currently leads to no PNGs being rendered.
> 
> Yeah, I think so -- the original checks would get confused by UTF-16 since it's
> not ASCII-compatible. A proper XML parse should work fine as long as the file's
> legit. (Unfortunately the referenced file has been deleted so I can't test on
> it.)

I re-uploaded the file. Hope it stays there now for some days longer...
Comment 12 Brion Vibber 2008-12-09 00:23:38 UTC
Created attachment 5565 [details]
Sample UTF-16 SVG

Attached a copy of the sample UTF-16 file noted above. (Note that this file contains only an embedded PNG image, base-64 encoded into the XML source, making it about the least efficient possible way to encode this image. :)
Comment 13 Brion Vibber 2008-12-09 00:27:06 UTC
Fixed in r44322, using a filter callback on XmlTypeCheck.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links