Last modified: 2014-10-11 14:34:12 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 2089 - Whitelist OASIS OpenDocument file format
Whitelist OASIS OpenDocument file format
Status: NEW
Product: Wikimedia
Classification: Unclassified
General/Unknown (Other open bugs)
unspecified
All All
: Normal enhancement with 23 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
https://commons.wikimedia.org/wiki/Co...
:
: 46977 (view as bug list)
Depends on: 24230
Blocks: 17497 multimedia 43154 43153
  Show dependency treegraph
 
Reported: 2005-05-06 11:00 UTC by Guttorm Flatabø
Modified: 2014-10-11 14:34 UTC (History)
28 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Guttorm Flatabø 2005-05-06 11:00:54 UTC
Currently (as far as I'm aware) you can upload OpenOffice.org 1.x files, at
least with the extension ".sxw". OpenOffice.org 2.x uses the new OASIS file
format (see link). 

The file upload whitelist should be extended to also include at least ".odt"
(writer), and possibly also ".odp", ".odg", ".odb".

OpenOffice-documents are useful for providing presentations and promotions.
Comment 1 JeLuF 2005-05-07 13:49:09 UTC
The trouble with file formats is MSIE. It tries to autodetect the file format of
a file that it downloads. If it *looks* like HTML, MSIE will display it -
executing the JavaScript that is in the file.

To add a file format, there must be a way to check that the file really is using
that format.
Comment 2 Iztok Jeras 2005-09-03 15:38:46 UTC
I am using OpenOffice.org to create figures on Wikibooks and there are many
figures for a single book. I would like to upload .odg source files for those
images, so that contributors could modify them.

About the MSIE problem, OpenDocument files are compressed (they do not look like
HTML), just unzip them and if there are no errors the file should be OK. If you
are still concerned about the achieve content, they can be scanned for viruses...
Comment 3 Raimond Spekking 2006-06-30 20:05:58 UTC
It would be great to enable OASIS-fileformats for Commons at least. Storing all
kind of documents that can be updated later would be a great benefit.
Comment 4 Daniel Kinzler 2007-02-09 13:54:03 UTC
I would suggest to *not* allow *any* styled text or presenation files on
commons. text on wmf projects should generally be wikitext. That being said,
having presentations and promotional material in those formats may make sense
for meta, wikimediafoundation.org, wikimania sites, etc.

just my 2p 
Comment 5 Raimond Spekking 2007-02-28 08:17:16 UTC
*** Bug 9127 has been marked as a duplicate of this bug. ***
Comment 6 Christof Hahn 2007-02-28 08:23:09 UTC
I'm a Author form Wikibooks Germany. In the last two months I start a project
for schoolbooks. So the intent for this Project is to develop materials for
teachers and pupils. So what we need is the support of the ODF-Format and a
place where we can upload every raw material in every format where teacher can
spend her existing learn materials. And of the other side we need the
possibility to upload 7-Zip to bundle learning material. 
Comment 7 jeroenvrp 2007-08-21 22:47:19 UTC
I think it's a good idea to disallow it on commons, but enable those file-formats by default and let the e.g. the wikipedia-projects by themselves decide if they want to allow those formats on their projects.

Also don't forget the .ods-files (spreadsheets). 

Jeroenvrp
Comment 8 Patrick 2007-09-03 01:52:52 UTC
In includes/mime.types the line

application/zip zip jar xpi  sxc stc  sxd std   sxi sti   sxm stm   sxw stw

must be changed to

application/zip zip jar xpi  sxc stc  sxd std   sxi sti   sxm stm   sxw stw  odt ods odp odg odf


This really should be the default MediaWiki configuration. Not being able to upload the only standardised Office file format to the most common Wiki software is kind of strange...
Comment 9 Brion Vibber 2008-03-28 00:14:22 UTC
Just a note -- the old StarOffice formats were disabled some time ago. OpenOffice (ODF) formats are enabled on our private/internal wikis, but not on the general wikis or in the MediaWiki default configuration.

An additional note is we have no current way to validate uploaded files as being ODF.
Comment 10 Robert Millan 2008-03-28 12:28:34 UTC
(In reply to comment #9)
> Just a note -- the old StarOffice formats were disabled some time ago.
> OpenOffice (ODF) formats are [...]

Note that the division is not really StarOffice/OpenOffice.  Both were using the old formats before, and use OpenDocument now (along with a lot of other apps, since that was the point of standarising it).

> An additional note is we have no current way to validate uploaded files as
> being ODF.

As long as the check is filename-based, this problem isn't introduced by adding "odt ods odp odg odf" to the list, since you can pass any kind of ZIP file as *.zip already.
Comment 11 Brion Vibber 2008-03-28 19:24:18 UTC
We don't allow .zip files. :)
Comment 12 Platonides 2008-03-28 22:38:13 UTC
(In reply to comment #9)
> An additional note is we have no current way to validate uploaded files as
> being ODF.


That's not a reason for not enabling odf, we don't really validate many types (bug 10823) 
<spam>I am independently validating commons uploads at #commons-image-uploads2 (it isn't 
so hard)</spam> and by allowing odf (at big projects), it wouldn't be that hard, 
just like pdfs: you need to manually review all of them and delete almost everyone.
Comment 13 Brion Vibber 2008-03-28 23:02:22 UTC
Currently allowed formats on Commons are:

png, gif, jpg, jpeg, xcf, pdf, mid, ogg, svg, djvu

I'm fairly certain we do at least magic-number signature validation on all of those now. PNG, GIF, and JPEG are run through a simple header sanity check. SVG is checked for XML well-formedness. DJVU is I believe checked for metadata validity, though I don't recall the details.
Comment 14 Michael Reschke 2008-03-28 23:52:25 UTC
Well, at the German Wikiversity we would need OASIS-files to upload editable documents and presentations. OASIS-Files at Commons would make our work much easier.
Comment 15 Robert Millan 2008-03-29 10:33:11 UTC
You can use odt2txt (http://stosberg.net/odt2txt/) for validation:

$ odt2txt hello.odt

  Hello

$ echo $?
0
$ zip test.zip hello.odt
updating: hello.odt (deflated 19%)
$ odt2txt test.zip
Can't read from test.zip: Is it an OpenDocument Text?
$ echo $?
1


It appears to work with the other types as well:

$ odt2txt hello.odp

  Hello

  This is a text

HTH
Comment 16 Michael Frey 2008-03-29 10:54:24 UTC
(In reply to comment #15)
> You can use odt2txt (http://stosberg.net/odt2txt/) for validation:

That does only verify that there is text, but it doesn't warn for macros or other files that are included but not relate in the odt file.

(Else some could have the genius idea to upload the odt file that contain a macro virus or contain pictures with forbidden content and use the WMF servers to share them. Users that know the hidden content can simply rename and extract the file and get so the hidden content, other users don't see it and think it's a normall text, but also get the pictures withforbidden content.)
Comment 17 Robert Millan 2008-03-29 11:30:26 UTC
(In reply to comment #16)

Or someone could use a program featuring steganography techniques (http://en.wikipedia.org/wiki/Steganography#Implementations) to embed forbidden content in a PNG.

As for the macro virus, proper sandboxing is expected to be present.  If it isn't, that's an implementation bug.
Comment 18 Ingo Thies 2008-05-02 13:37:55 UTC
(In reply to comment #4)
> I would suggest to *not* allow *any* styled text or presenation files on
> commons. text on wmf projects should generally be wikitext. That being said,
> having presentations and promotional material in those formats may make sense
> for meta, wikimediafoundation.org, wikimania sites, etc.

Please keep in mind that OpenOffice.org file types also include spreadsheets (*.ods) that can be used not only for presentation but also as an interactive calculation tool. The author defines a "user area" within a sheet where the user can enter parameters based on which calculations on a scientific topic is done. For example, you can write a sheet that calculates, tabulates and/or plots the pressure, temperature and density of the atmosphere in a user-defined altitude for the standard atmosphere, or orbital parameters of satellites or any other kind of scientific or technical stuff. In contrast to most other file types (and as far as I know all file types currently allowed on Commons) spreadsheets can be used *interactively*, which can be a great improvement for many science-related Wikipedia articles. Furthermore, I do not really see a reason for *not* allowing any styled context. The existence of wikitext IMHO does not strictly imply all other text formats being invalid. Please also remember that ODF is now an ISO standard.

Therefore I would strongly suggest to allow Open Dodument Format in general, but at least Open Document Sheets (*.ods).
Comment 19 Robert Millan 2008-05-02 18:38:35 UTC
(In reply to comment #18)
> 
> Please keep in mind that OpenOffice.org file types [...]

Please, try to avoid confusing ODF with OpenOffice.org.  There are many applications supporting ODF independently, and OpenOffice.org is just one of them (see http://boycottnovell.com/2008/01/20/odf-is-not-openoffice-org/).

> [...]. Please also
> remember that ODF is now an ISO standard.

which unfortunately doesn't mean much anymore.  Even OOXML which not even Microsoft themselves (http://www.fanaticattack.com/2008/ooxml-questions-microsoft-cannot-answer-in-geneva.html#comment-220) have implemented can get its own ISO stamp.

IMHO, what's important is that any vendor can implement ODF, and the wide availability of ODF support in applications:

http://en.wikipedia.org/wiki/OpenDocument_software#Current_support
Comment 20 Ingo Thies 2008-05-29 13:20:27 UTC
(In reply to comment #19)

> Please, try to avoid confusing ODF with OpenOffice.org.  There are many
> applications supporting ODF independently, and OpenOffice.org is just one of
> them (see http://boycottnovell.com/2008/01/20/odf-is-not-openoffice-org/).

You are right, I sometimes mix them up, because I am using ODF mainly via OpenOffice.org.

> IMHO, what's important is that any vendor can implement ODF, and the wide
> availability of ODF support in applications:
> 
> http://en.wikipedia.org/wiki/OpenDocument_software#Current_support

That's fully true. But as mentioned above, the major benefit for Wikipedia (where formatted content seems to be frowned upon unless the format is Wikitext and Wikitable etc.) would be the ability of interactive use at least for Open Document Spreadsheets (ODS). Allowing the upload of self-written source codes in common programming languages would also have the effect of allowing interactivity, but ODS allows interactivity in a very transparent and easy-to-use way. The following example (a zipped Excel spreadsheet, however) might explain what I mean:

http://nuclearweaponarchive.org/Library/Nukexls.zip

Such sheets, also including graphs, could be used for an interactive illustration of (not only) many physical and techical topics without forcing the user to type the formulas by him/herself.
Comment 21 Cormac Lawler 2008-05-29 21:48:45 UTC
As mentioned above, having greater possibility for interactivity in files would greatly benefit Wikiversity. Particularly for presentations, but also for image files, spreadsheets (data), and others. On opposition to this proposal, are there fears around certain formats on certain sites? If so, perhaps projects could draw up a list of filetypes which would be useful, and provide a rationale for them to be (selectively) whitelisted.
Comment 22 Robert Leverington 2008-06-01 15:33:18 UTC
(In reply to comment #21)
> As mentioned above, having greater possibility for interactivity in files would
> greatly benefit Wikiversity. Particularly for presentations, but also for image
> files, spreadsheets (data), and others. On opposition to this proposal, are
> there fears around certain formats on certain sites? If so, perhaps projects
> could draw up a list of filetypes which would be useful, and provide a
> rationale for them to be (selectively) whitelisted.
> 

The main issue is that OASIS files can contains malicious content. Letting these be uploaded without validation would be undesirable, and as of yet (AFAIK) there is no OASIS validation interface for MediaWiki.
Comment 23 Platonides 2008-06-01 21:45:56 UTC
Can you elaborate what malicious content do you refer? Zip files being uploaded as odf? Documents with embedded macros?
Comment 24 Robert Leverington 2008-06-01 21:47:52 UTC
(In reply to comment #23)
> Can you elaborate what malicious content do you refer? Zip files being uploaded
> as odf? Documents with embedded macros?
> 

Macros are the main issue, they are XML so running it through a basic XML parser would eliminate any Zip issue.
Comment 25 Robert Leverington 2008-06-01 21:49:01 UTC
(In reply to comment #24)
> Macros are the main issue, they are XML so running it through a basic XML
> parser would eliminate any Zip issue.
> 

Ignore the zip bit, they can be compressed -- as I have just found out.
Comment 26 Platonides 2008-06-01 22:13:37 UTC
Seems macros are at <script> elements (<text:script>, <office:script>...) so doesn't look too hard.
Comment 27 Mandavi 2008-11-06 17:16:16 UTC
Sun published the ODF Validator. It "is a tool that validates OpenDocument files and checks them for certain conformance criteria." That sounds like the tool we need.
Comment 28 Robert Millan 2008-11-06 17:34:15 UTC
Unfortunately, with ISO's downfall in the IT sector, being an ISO standard is become less and less meaningful.  I'm removing the "ISO" bits from bug title (which IIRC I added myself a while ago).
Comment 29 Lars Aronsson 2008-11-19 13:12:36 UTC
In a posting to wikitech-l, Brion Vibber elaborated on what's needed in an ODF validator,
http://lists.wikimedia.org/pipermail/wikitech-l/2008-November/040246.html

Brion said:

we have a basic file type check to confirm
that the file really thinks its an ODF of the appropriate extension, but
not yet checks to confirm there's not evil Java classes also sitting in
the ZIP etc.) [...]

There's an optional zip extension for PHP which should include support
for listing out the ZIP file directory; however since this isn't
included in PHP by default it might be nice to be able to read the
directory independently without the extension for general MediaWiki
installs. (It shouldn't be necessary to actually decompress anything for
our purposes here -- we're mainly looking for subfiles not expected in
an ODF, particularly Java classes that could be used for a session attack.)

Comment 31 SJ 2011-08-09 18:03:05 UTC
Yes, please.  

Asking people to use a secondary file-hosting system for materials they are uploading for use with wikiversity or wikibooks projects is embarrassing, and gets moreso every year.

People who are trying to use Commons (for classes or other collaborative-knowledge projects) commonly work with these standard formats; asking them to convert things to/from PDF is quite difficult considering the scarcity of freely-licensed PDF-editing tools.
Comment 32 Bawolff (Brian Wolff) 2012-11-19 21:27:23 UTC
This bug may have been fixed in the mean time (In particular, I had the impression that Tim did work on allowing zip based formats to be uploaded safely). Tagging testme. [Note comment 31: Fixing this bug, and enabling on Wikimedia are two different things].

Then again bug 35607 appears to suggest our support for open doc is broken (?).
Comment 33 Alex Monk 2013-04-07 00:23:54 UTC
*** Bug 46977 has been marked as a duplicate of this bug. ***
Comment 34 Juergen Fenn 2013-04-07 00:43:59 UTC
Thanks for including my request for enabling ODF upload for German Wikiversity. 

Could you please indicate whether we are running any chance to have ODF upload implemented in the near future? 

I would like to hand on the message to the German Wikiversity community ASAP. 

Thx!
Comment 35 Nemo 2013-04-07 01:07:26 UTC
(In reply to comment #34)
> Could you please indicate whether we are running any chance to have ODF
> upload
> implemented in the near future? 

No. (Although I'd like to say the contrary.)
Comment 36 Jean-Fred 2013-08-28 09:55:38 UTC
See the links from <http://lists.wikimedia.org/pipermail/wikitech-l/2012-April/059837.html>

And the discussion at <https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2012/03#Enabling_upload_of_ZIP_types.2C_such_as_MS_Office_or_OpenOffice>

It was stated that, with the resolution of bug 24230, « Uploads of ZIP types, such as MS Office or OpenOffice can now be safely enabled. A ZIP file reader was added which can scan a ZIP file for potentially dangerous Java applets. This allows applets to be blocked specifically, rather than all ZIP files being blocked. »

I have asked the question in several places and answers are both unclear and sometimes contradictory. Some have pointed out that concerns lie still with:
* Potential embedded macros
* Validation that it is actually ODF

Are these concerns valid? If not, what is missing to allow ODF upload on projects?
Comment 37 Nemo 2013-08-28 10:46:16 UTC
If nobody comes up with concrete concerns, would it be a valid proposal to just try and see how it goes, fixing problems as they come up?

PDF files are not exempt from problems either, we often have some with viruses; but most of them are deleted quickly and the others we found thanks to your.org running antivirus software on their copy.
Comment 38 Bawolff (Brian Wolff) 2013-08-28 16:19:59 UTC
(In reply to comment #36)
> See the links from
> <http://lists.wikimedia.org/pipermail/wikitech-l/2012-April/059837.html>
> 
> And the discussion at
> <https://commons.wikimedia.org/wiki/Commons:Village_pump/Archive/2012/
> 03#Enabling_upload_of_ZIP_types.2C_such_as_MS_Office_or_OpenOffice>
> 
> It was stated that, with the resolution of bug 24230, « Uploads of ZIP types,
> such as MS Office or OpenOffice can now be safely enabled. A ZIP file reader
> was added which can scan a ZIP file for potentially dangerous Java applets.
> This allows applets to be blocked specifically, rather than all ZIP files
> being
> blocked. »
> 
> I have asked the question in several places and answers are both unclear and
> sometimes contradictory. Some have pointed out that concerns lie still with:
> * Potential embedded macros
> * Validation that it is actually ODF
> 
> Are these concerns valid? If not, what is missing to allow ODF upload on
> projects?

The zip reader prevents someone from uploading an ODF file that's really a java archive, which was a pretty big security vulnrability. (It also would prevent those hacks where people make combined ODF/PDF files).

It does not prevent embedded macros, nor does it validate the file is an ODF file (beyond some very superficial checks. It would prevent someone from accidentally uploading another format. It would not prevent someone intentionally uploading a non-odf format that they've tweaked to slightly look like an ODF file)

Whether or not this is an acceptable situation (I consider the macro virus possibility a little scary. Platonides suggestion in comment 26 may be something we should look into) is probably a matter that's up to debate. I've cc'd Chris Steipp, as he probably has some thoughts on this, and would probably have the final word on if ODF upload is acceptable.
Comment 39 Chris Steipp 2013-08-29 00:11:23 UTC
The major threats I'm most concerned with are these attachments opening up and xss by causing the browser to think it's html, java applet, swf, etc.

So if it correctly unzips to something that validates as an odf, and the binary is checked to make sure sniffing wont think it's html, or another mime type, then we can probably enable this. Bawolff, could you confirm that's what it does?

The macro / embedded virus threat is definitely a danger to our users, but we currently do not scan incoming binaries (as Nemo pointed out, we have plenty of pdfs with hostile code already).
Comment 40 Bawolff (Brian Wolff) 2013-08-29 03:46:51 UTC
(In reply to comment #39)
> The major threats I'm most concerned with are these attachments opening up
> and
> xss by causing the browser to think it's html, java applet, swf, etc.
> 
> So if it correctly unzips to something that validates as an odf, and the
> binary
> is checked to make sure sniffing wont think it's html, or another mime type,
> then we can probably enable this. Bawolff, could you confirm that's what it
> does?

I believe that is correct.
Comment 41 Nemo 2013-09-10 07:32:27 UTC
OSM added odp a minute ago: https://trac.openstreetmap.org/ticket/3323#comment:2
So, there are no blockers left here AFAICS, but we also have a guinea pig.
Comment 42 Jean-Fred 2013-10-21 17:30:00 UTC
So per Chris & Brian comments, there are no security concerns blocking this?

If so, the "editorial" question is left on whether we want this on Wikimedia Commons or on Meta. I am more than willing to open a discussion over there, if that’s the last thing missing.
Comment 43 Steinsplitter 2013-12-05 15:30:03 UTC
Community consensus?
Comment 44 Quim Gil 2013-12-05 17:01:29 UTC
At a MediaWiki community level this looks like a consensus, yes. This report is at a point where no technical/legal obstacles are left.

Now, about deployment in Wikimedia... This is a Commons discussion. There is the right place to decide whether OpenDocument files belong to their domain (just like PDFs) or not. If they agree, then the related file extension can be enabled there. If they disagree... we can meet here again to discuss the next step.

Does this make sense? If so, can someone familiar with the Commons project share the news there, please?
Comment 45 Nemo 2013-12-05 17:19:34 UTC
(In reply to comment #42)
> I am more than willing to open a discussion over there,
> if
> that’s the last thing missing.

Jean-Fred, per Quim it is, so please go ahead, yes.
Comment 46 Jean-Fred 2013-12-05 17:23:40 UTC
(In reply to comment #45)
> (In reply to comment #42)
> > I am more than willing to open a discussion over there,
> > if
> > that’s the last thing missing.
> 
> Jean-Fred, per Quim it is, so please go ahead, yes.

Good. I’ll open a community consultation soonish.
Comment 47 dacuetu 2014-05-19 07:46:43 UTC
What was the result of this? It might affect this rfc
https://meta.wikimedia.org/wiki/Requests_for_comment/How_to_deal_with_open_datasets
Comment 48 Nemo 2014-05-26 07:43:14 UTC
What's concretely the configuration setting needed here?

(In reply to Jean-Fred from comment #46)
> Good. I’ll open a community consultation soonish.

Ping.

(In reply to dacuetu from comment #47)
> What was the result of this? 

Consensus for OpenDocument has always been given for granted: in the innumerable discussions about it I don't recall ever finding an opposer. There are for instance a dozen supporters just in https://commons.wikimedia.org/wiki/Commons_talk:File_types/Archive_1 (several sections; nobody bothered +1 what was obvious).
Comment 49 Nemo 2014-05-26 07:45:19 UTC
Also, https://commons.wikimedia.org/wiki/Commons:Project_scope/Allowable_file_types is apolicy and states the formats are allowed by policy, just blocked on technical reasons. «SXW, SWC, SXD, and SXI (OpenOffice.org 1.x), as well as ODT, ODS, ODG, and ODP (OpenDocument) are theoretically permissible. Marking this shell; more discussion is always possible but not necessary.
Comment 50 Jean-Fred 2014-07-06 14:09:15 UTC
(In reply to Nemo from comment #48)
> What's concretely the configuration setting needed here?
> 
> (In reply to Jean-Fred from comment #46)
> > Good. I’ll open a community consultation soonish.
> 
> Ping.

Thanks for the ping, I completely forgot about this. This is now opened at https://commons.wikimedia.org/wiki/Commons:Village_pump/Proposals#Support_for_OpenDocument_file_format_upload

> 
> (In reply to dacuetu from comment #47)
> > What was the result of this? 
> 
> Consensus for OpenDocument has always been given for granted: in the
> innumerable discussions about it I don't recall ever finding an opposer.
> There are for instance a dozen supporters just in
> https://commons.wikimedia.org/wiki/Commons_talk:File_types/Archive_1
> (several sections; nobody bothered +1 what was obvious).

Hmmmm, indeed. Well, let's make it super clear :)
Comment 51 John Mark Vandenberg 2014-10-11 14:34:12 UTC
So the result was 'interested, but no consensus' due to the need to have media preview for this format and concerns that these uploaded documents may contain
* macros/scripts, which may be malicious
* embedded typefaces, which may be non-free

https://commons.wikimedia.org/wiki/Commons:Village_pump/Proposals/Archive/2014/07#Support_for_OpenDocument_file_format_upload

In that discussion, PDF with OpenDocument embedded was raised as a bug and possible way forward, as we already have PDF preview support, so I have created bug 71954 for that.

We will also need bugs for detecting macros/scripts and embedded typefaces, and bug 17497 probably needs to be solved first.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links