Last modified: 2012-04-16 09:15:39 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 27544 - SVG images that are invalid XML no longer rendered
SVG images that are invalid XML no longer rendered
Product: MediaWiki
Classification: Unclassified
File management (Other open bugs)
All All
: Lowest major (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on:
Blocks: 26676 27339
  Show dependency treegraph
Reported: 2011-02-18 20:04 UTC by Bryan Tong Minh
Modified: 2012-04-16 09:15 UTC (History)
9 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---

example file (5.14 KB, image/svg+xml)
2011-02-19 09:20 UTC, Derk-Jan Hartman

Description Bryan Tong Minh 2011-02-18 20:04:27 UTC
As of r75968, SVG images that are invalid XML no longer render. Previously those images where allowed by MediaWiki and it appears that rsvg happily accepts them. 

I agree that we should not accept SVG images which are not valid XML, however I am not convinced that breaking images which used to render fine is a good approach. 

People in #wikimedia-commons suggest that a few hundreds of SVGs on Commons are broken. My analysis on a random set suggests that less than 0.1% of the SVG images in use on Wikimedia is affected. (Checked a random set of 1000 images, found none)

The question is whether we should fall back to the old behavior if the file does not pass the new file checks, or whether we should just refuse rendering them.
Comment 1 Mark A. Hershberger 2011-02-18 20:28:01 UTC
I would recommend falling back, but it would be good to have an example of those that are a problem before making a final decision.
Comment 3 Mark A. Hershberger 2011-02-18 20:39:42 UTC
This should be fixed.  I understand why that one doesn't display, but errors like that (undeclared, but well-known namespaces), shouldn't cause a problem with display.

Would it be possible to let users know at upload time that they have provided invalid XML and that it should be fixed?  Could we do that *and* display already-uploaded images like that one?
Comment 4 Rob Lanphier 2011-02-19 04:37:54 UTC
This one may be a little beyond our control, unless we want to maintain a custom, non-compliant XML parser.  The XML specification forbids compliant implementations from doing anything but failing spectacularly, and most decent XML parsers follow the rules.

I don't know exactly what change triggered the new behavior (did we switch implementations?), but I think we're better off encouraging a cleanup of the existing images rather than implementing a fallback strategy.
Comment 5 Derk-Jan Hartman 2011-02-19 08:41:56 UTC
we didn't use the xmlreader pull parser before. I swithed the whole thing from XmlTypeCheck to SVGReader in r75968
Comment 6 Derk-Jan Hartman 2011-02-19 09:04:17 UTC
i don't have the problem at home btw, so I guess the XMLReader version installed on wikimedia servers behaves different from the one I have installed.
Comment 7 Derk-Jan Hartman 2011-02-19 09:19:47 UTC
Seems to me that the WMF version of XMLReader is validating the file, while as far as I know, that is not the default mode of XMLReader.
Comment 8 Derk-Jan Hartman 2011-02-19 09:20:05 UTC
Created attachment 8170 [details]
example file
Comment 9 Bryan Tong Minh 2011-02-19 14:13:14 UTC
(In reply to comment #7)
> Seems to me that the WMF version of XMLReader is validating the file, while as
> far as I know, that is not the default mode of XMLReader.

I'm not too familiar with XML, but does that mean that the DTD is fetched from every time an SVG file is viewed?
Comment 10 Bryan Tong Minh 2011-02-19 14:17:37 UTC
$this->reader->setParserProperty( XMLReader::VALIDATE, false ); would presumably fix it.
Comment 11 Derk-Jan Hartman 2011-02-19 14:22:10 UTC
like i said, false should be the default value anyways. At least from what I understand. but we could run a test on wmf servers i guess to make sure.

/me has no access.
Comment 12 Derk-Jan Hartman 2011-02-19 16:22:58 UTC
summary, on wmf servers, guessMimeType returns 'application/xml' instead of 'image/svg+xml'. guess mimetype calls doGuessMimeType(), which will call XmlTypeCheck()

$xml = new XmlTypeCheck( '' );
var_dump( $xml->wellFormed );

shows the wmf servers as evaluating wellFormed to false. This is not correct btw, for most of these files, since most are only invalid but are wellformed, this is a bug in XmlTypeCheck().

Actually, on my own server it even always evaluates to false, but later parts apparently pick up the slack.

MimeMagic::doGuessMimeType: analyzing head and tail of /private/var/tmp/phpwMVYQ7 for magic numbers.
DjVuImage::getInfo: not a DjVu file
MimeMagic::guessMimeType: internal type detection failed for /private/var/tmp/phpwMVYQ7 (.)...
MimeMagic::detectMimeType: magic mime type of /private/var/tmp/phpwMVYQ7: image/svg+xml
MimeMagic::guessMimeType: guessed mime type of /private/var/tmp/phpwMVYQ7: image/svg+xml
MimeMagic::improveTypeFromExtension: improved mime type for .svg: image/svg+xml
File::getPropsFromPath: /private/var/tmp/phpwMVYQ7 loaded, 5266 bytes, image/svg+xml.

Internal type detection failed, indicates that doGuessMimeType() has returned false and detectScript() takes over and selects image/svg+xml for me, but apparently not on the WMF installation.

I guess in 1.16 something was correcting for this issue for wmf as well, but now no more...
Comment 13 Bryan Tong Minh 2011-02-19 17:11:13 UTC
detectMimeType uses one of the mime detection PHP extensions. I don't see any changes between 1.16 and 1.17 that would explain the difference.

The bug is with XmlTypeCheck though. xml_parse() returns an error for both an invalid as an unwellformed document, whereas we only want an error on an unwellformed document.
Comment 14 Derk-Jan Hartman 2011-02-19 17:37:45 UTC
two options, rewrite to XMLReader, or set XmlTypeCheck to ignore namespaces all together. Implications of the latter not fully understood yet. Best consult Brion or Tim on this, esp. since brion originally added

Comment 15 Mark A. Hershberger 2011-02-19 17:41:42 UTC
(In reply to comment #9)
> I'm not too familiar with XML, but does that mean that the DTD is fetched from
> every time an SVG file is viewed?

No.  The URIs in xmlns attributes are not meant to be resolved and fetched.  Often, they can be, since the "http" schema is used and people DTD at that address.  But it is only meant to serve as a unique identifier, not a resource to be retrieved.

In practice, though, some (poorly written) validators do attempt to fetch the resource every time.
Comment 16 Bryan Tong Minh 2011-02-19 22:28:31 UTC
To clarify, for people who have not followed the IRC discussion:
* The original assumption that the non-rendering of certain SVGs was caused by the new SvgMetadataExtrator was false.
* MimeMagic::doGuessMimeType uses XmlTypeCheck to detect the mime type; this fails because XmlTypeCheck::wellFormed is set to false when the SVG is non-validating instead of non-wellformed
* For me and Derk-Jan, the fileinfo pecl extension then detects the file as image/svg+xml anyway, but at Wikimedia it is detected as application/xml

XmlTypeCheck must be fixed to only check wellformedness.
Additionally, it would be interesting why at Wikimedia there is a difference between 1.16 and 1.17.

A suggested approach for fixing XmlTypeCheck would be to change xml_parse_create_ns() into xml_parse_create(), but somebody who understands Xml properly needs to look at that.
Comment 17 Mark A. Hershberger 2011-02-19 22:48:38 UTC
See r43627 for where Brion forces the use of xml_parser_create_ns() instead of allowing xml_parser_create(). Thinking this is too picky per the bug.
Comment 18 Brion Vibber 2011-02-21 18:18:05 UTC
I got a little lost reading the backscroll here, but let me just summarize the expected behavior:

* invalid[1] SVG file MUST be rejected on upload
* invalid[1] SVG file that's already existing SHOULD be rejected from rendering, but if it does render it's not the end of the world.

[1] by invalid I mean one or more of the following is true:
* not well-formed XML (ie, any error is thrown by XML parser)
* not well-formed namespaces (ie, an error is thrown by namespace-aware XML parser about inconsistent or undeclared namespaces)
* root element isn't <svg> with expected namespace
* anything in the file violates our generic or SVG-specific safety checks

Non-rendering of broken files is not a bug; it's expected behavior. Affected files should be fixed.

New files being uploaded that do not pass the validation checks *is* a bug, if it's happening. Failure to live through our XML/SVG-specific checks MUST NOT be overridden by some other generic check.
Comment 19 Bryan Tong Minh 2011-02-21 18:24:46 UTC
(In reply to comment #18)
> Failure to live through our XML/SVG-specific checks MUST NOT be
> overridden by some other generic check.

The best way to fix this is to split the valid and wellformedness check. Wellformed but invalid XML should be marked application/xml.

Unwellformed and invalid XML should never be marked application/xml or any XML subtype, even of if the PHP file info extension recognizes it as such.
Comment 20 Mark A. Hershberger 2011-02-21 18:32:26 UTC
Since the only identified files are missing namespace declarations, and taking what seems to be the consensus of developers here into consideration, I'm closing this as WONTFIX -- the identified files should be fixed and re-uploaded.
Comment 21 Brion Vibber 2011-02-21 20:25:05 UTC
Just a terminology clarification: what we're checking for well-formedness here including the namespaces is called "namespace-well-formed"ness, and is again not related to what XML terms "validity", which we never check for.
Comment 22 Aryeh Gregor (not reading bugmail, please e-mail directly) 2011-02-21 23:46:25 UTC
Although maybe we should define a subset of SVG using our own DTD, excluding script elements and attributes, and test for validity using that DTD.  That would be a very easy way to do whitelist-based security, which makes me feel happier than the current blacklist-based approach.  Are we really sure that all JS-activating attributes start with "on", and that no element will allow script if it's not named "script"?  (But that's a separate issue.)
Comment 23 Rainald Koch 2011-03-14 20:11:42 UTC
I had problems uploading an svg which passed the W3C validator without any comment and also the validator on the toolserver parsed it correctly and gave no hints. The error message on upload was "wrong MIME type" (in german).

It took me two evenings to learn that the problem had to do with namespaces (i had given svn but failed to add xlink).

The validator on the toolserver should be in line with the checks done on upload and the error message should be more informative.

File:FMCW Doppler radar.svg
Comment 24 Derk-Jan Hartman 2011-03-14 20:17:00 UTC
We already have bug #27537 for the issue with the error message. I don't see it happening that we are gonna support files anytime soon that are not namespace-well-formed, so reclosing.
Comment 25 Rainald Koch 2011-03-14 23:46:46 UTC
Reclosing is ok, I reopened it only to be heard and appreciating the decision of Brian to exclude files which are not namespace-well-formed.

Thank you for the hint at bug #27537.

Do you also know, whether the two bugs of the png renderer which show up with my svg are already registered? As svg, it is displayed as expected, but all the pngs of different size ...

- do not show the indices (<tspan style="baseline-shift:sub;font-size:0.85em">D</tspan>)

- ignore the positioning baseline-shift="-30%"
Comment 26 Derk-Jan Hartman 2011-03-15 07:46:42 UTC
Yes, I believe both of those are known issues, and filed somewhere under the "depends" of ticket #8901

Note You need to log in before you can comment on or make changes to this bug.