Last modified: 2014-11-06 18:53:52 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T67724, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 65724 - Chunked upload of SVGs triggers INVALIDXML exception, but file is valid
Chunked upload of SVGs triggers INVALIDXML exception, but file is valid
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Uploading (Other open bugs)
1.24rc
All All
: Normal normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: chunked-upload 58553
  Show dependency treegraph
 
Reported: 2014-05-24 15:44 UTC by Christian
Modified: 2014-11-06 18:53 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Christian 2014-05-24 15:44:37 UTC
Uploading a 70mb SVG file created with Inkscape fails, due to "uploadinvalidxml" exception thrown.  The SVG contains a large embedded png, so variing the chunk size between 500kb to 5mb does not matter: lots of upload chunks will simply contain a base64 encoded string only.

I suspect this to be the reason for verifyPartialFile to reject chunks as invalidxml, but ultimately am not 100% sure:  Shouldn't a test on well-formed-ness of XML be always done on the stashed (complete) file rather than on the chunks?  This test for wellformedness within the scope of verifyPartialFile has been introduced by bug 58553, maybe regression!?


The backtrace to this is about verifyChunk() -> verifyPartialFile() -> detectScriptInSVG() -> XmlTypeCheck->wellformed == false.


Relevant code entries are to be found here:

function verifyChunk():
https://doc.wikimedia.org/mediawiki-core/master/php/html/UploadFromChunks_8php_source.html#l00364
https://doc.wikimedia.org/mediawiki-core/master/php/html/UploadFromChunks_8php_source.html#l00369
https://doc.wikimedia.org/mediawiki-core/master/php/html/UploadFromChunks_8php_source.html#l00373

function verifyPartialFile():
https://doc.wikimedia.org/mediawiki-core/master/php/html/UploadBase_8php_source.html#l00481
https://doc.wikimedia.org/mediawiki-core/master/php/html/UploadBase_8php_source.html#l00507

function detectScriptInSVG():
https://doc.wikimedia.org/mediawiki-core/master/php/html/UploadBase_8php_source.html#l01255
https://doc.wikimedia.org/mediawiki-core/master/php/html/UploadBase_8php_source.html#l01265


Thanks for dedicating time to fix this.
Comment 1 Christian 2014-05-24 17:51:21 UTC
Linking (instead of embeding) the image like done currently for

https://commons.wikimedia.org/wiki/File:Bahnstrecke_Zeitz-Camburg_-_Verlauf_um_1930_-_Umnutzung_als_Radweg_ab_Juli_2012_-_hinterlegt_mit_MTB_Detailkarten.svg

also fails, since librsvg does not handle HTTP(S) links within xlink:href


However opening the svg from commons servers straight into Firefox (29) works (surprisingly).  Firefox fetches HTTP(S) resources within SVGs and renders the result just fine.  This leads to the weird situation that a locally opened image appears entirely different from the png rendered preview by commons.


But this is probably within the scope of librsvg bugtracker, not mediawikis.  Just so to mention that what seemed to be a clean workaround (according to SVG spec at least) does not resolve this issue either.  As of now there is no way to bring this image to Commons, but there was: before bug 58553 got "fixed".
Comment 2 Bawolff (Brian Wolff) 2014-05-24 17:58:35 UTC
(In reply to Christian from comment #1)

> 
> However opening the svg from commons servers straight into Firefox (29)
> works (surprisingly).  Firefox fetches HTTP(S) resources within SVGs and
> renders the result just fine.  This leads to the weird situation that a
> locally opened image appears entirely different from the png rendered
> preview by commons.
> 
> 
> But this is probably within the scope of librsvg bugtracker, not mediawikis.
> Just so to mention that what seemed to be a clean workaround (according to
> SVG spec at least) does not resolve this issue either.  As of now there is
> no way to bring this image to Commons, but there was: before bug 58553 got
> "fixed".

No, we specificly disable external images. Its a modification we made to librsvg (If your browser fetchs external images, that's its bussiness, but if WMF servers are fetching arbitrary things, that opens up some issues that are slightly scary security wise. Since 99% of the time svgs don't really need that sort of feature anyways, we disabled it).

> As of now there is
> no way to bring this image to Commons, but there was: before bug 58553 got
> "fixed".

Until this issue is resolved, try uploading the file with [[commons:Special:Upload]]. The old upload form will still upload files without chunked uploading for files < 100 mb.
Comment 3 Christian 2014-05-25 10:17:22 UTC
I've tried uploading the old upload form, but get reproducable Gateway timeout (my upstream is at 1mbit).  This is why I've tried chunked upload.


As for specifically disabling external images for SVGs:  Why?  You're breaking the standard doing this.  This is a security issue of the respective JPG or PNG libraries you're talking about.  They need to be as recent and as secure as possible for this.

If they are not, then an "attacker" (i.e. commons user) could simply upload his/her malicious png/jpg using the upload form and the image would be processes by those same libraries anyway (!)


A just reason for refraining HTTP(S) references in librsvg would be the abscence of a guarantee on availability of the external resource over time.
This could be solved using two methods, the second one being the stricter one:

 1) cache external refs on thumbnail generation, check for updates on external server on thumbnail re-generation

 2) allow external refs to images residing on wikimedia servers only


The second method should be achievable even without a regexp match by simply doing a "starts with" check on the "xlink:href" value for "http://commons.wikimedia.org/" or "http://commons.wikimedia.org/", virtually this would not cost any performance.  If a regexp check is tolerable performance-wise, then support for subprojects of the wikimedia eco-system might be included as well.


_________
Ultimate security is a black box.  Wikipedia is about sharing.
Comment 4 Christian 2014-05-25 11:28:05 UTC
(In reply to Christian from comment #3)
> "http://commons.wikimedia.org/"

should read https://commons.wikimedia.org/
Comment 5 Bawolff (Brian Wolff) 2014-05-25 13:18:19 UTC
(In reply to Christian from comment #3)
> I've tried uploading the old upload form, but get reproducable Gateway
> timeout (my upstream is at 1mbit).  This is why I've tried chunked upload.
> 
> 
> As for specifically disabling external images for SVGs:  Why?  You're
> breaking the standard doing this.  This is a security issue of the
> respective JPG or PNG libraries you're talking about.  They need to be as
> recent and as secure as possible for this.
> 
> If they are not, then an "attacker" (i.e. commons user) could simply upload
> his/her malicious png/jpg using the upload form and the image would be
> processes by those same libraries anyway (!)
> 
> 
> A just reason for refraining HTTP(S) references in librsvg would be the
> abscence of a guarantee on availability of the external resource over time.
> This could be solved using two methods, the second one being the stricter
> one:
> 
>  1) cache external refs on thumbnail generation, check for updates on
> external server on thumbnail re-generation
> 
>  2) allow external refs to images residing on wikimedia servers only
> 
> 
> The second method should be achievable even without a regexp match by simply
> doing a "starts with" check on the "xlink:href" value for
> "http://commons.wikimedia.org/" or "http://commons.wikimedia.org/",
> virtually this would not cost any performance.  If a regexp check is
> tolerable performance-wise, then support for subprojects of the wikimedia
> eco-system might be included as well.
> 
> 
> _________
> Ultimate security is a black box.  Wikipedia is about sharing.

Its hardly an unreasonable burden, given that commons considers embedding raster files in svgs innappropiate in most cases that they are used.


This isnt the right place for arguing about this (since its off topic for this specific bug report). Bring it up on wikitech-l if you feel strongly about it.
Comment 7 Brad Jorsch 2014-10-03 15:44:46 UTC
It isn't anything unique to large SVGs, any SVG will trigger this problem. I've been testing locally with SVGs like the following (and very small chunks):

 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
 <svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="189" height="31"><defs/><text x="9" y="21" style="font-size:15px;fill:#000000;stroke:none;font-family:DejaVu Sans Mono">2014-10-03 14:51:00</text></svg>

Fixing it from a technical perspective is easy enough (as suggested, the specific check here should be done from verifyFile() rather than verifyPartialFile()), but I'm not sure it'll pass security review. Let's try.
Comment 8 Gerrit Notification Bot 2014-10-03 15:54:01 UTC
Change 164569 had a related patch set uploaded by Anomie:
Don't try to verify XML well-formedness for partial SVG uploads

https://gerrit.wikimedia.org/r/164569
Comment 9 Gerrit Notification Bot 2014-11-06 18:50:32 UTC
Change 164569 merged by jenkins-bot:
Don't try to verify XML well-formedness for partial SVG uploads

https://gerrit.wikimedia.org/r/164569
Comment 10 Brad Jorsch 2014-11-06 18:53:52 UTC
The fix should be deployed to WMF wikis with 1.25wmf8, see https://www.mediawiki.org/wiki/MediaWiki_1.25/Roadmap for the schedule.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links