Last modified: 2014-07-22 00:53:44 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 40479 - File extensions should be automatically decided by MIME type at upload
File extensions should be automatically decided by MIME type at upload
Status: NEW
Product: MediaWiki
Classification: Unclassified
Uploading (Other open bugs)
unspecified
All All
: Low enhancement with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-24 16:37 UTC by johnnymrninja
Modified: 2014-07-22 00:53 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description johnnymrninja 2012-09-24 16:37:27 UTC
Breaking this of related bug 32660, which was broken off of bug 4421. This would also solve bug 29284.

As MW detects the MIME type of the file as it is being uploaded, it should not rely on the uploader to provide a file extension. Rather the file type should be set automatically by the software. Any extension detected in the name should be automatically removed.

For example if Cheese.JPEG is uploaded, but the MIME type is PNG, the file should be named Cheese.png, and not Cheese.JPEG.png. If that MIME type is correct, it should simply be named Cheese.jpg. This should also create a notice for the uploader, so they don't lose track of their uploaded file.

Obviously this will not fix existing issues mentioned in the first two bugs, but it will prevent future issues.
Comment 1 johnnymrninja 2012-09-24 16:58:59 UTC
Hopefully this should also prevent files with unknown or unsupported MIME types from being uploaded with a supported extension. So Trojan.XXX shouldn't be uploaded as Trojan.gif. This would mean a list of extensions that unknown MIME type uploads are checked against.
Comment 2 Sven Manguard 2012-09-24 17:11:20 UTC
This is fantastic. I recommend that you use the shortest form in all lowercase as the chosen extension (i.e. ".jpg" instead of ".jpeg" or ".JPG". This is because .jpg is the most common variant for jpegs by a great deal, and .tif is the most common varient for tiffs by something of a large-ish margin.

My one concern is the handling of .ogg and .ogv. These two can /occasionally/ but not always be used interchangeably, or at the very least, have been. We can't eliminate either, but we might (I lack the technical knowledge to tell for certain) run into problems with this.

Thanks for doing this,
Sven
Comment 3 Bawolff (Brian Wolff) 2012-09-24 17:26:58 UTC
Say someone uploads a file named: "esp. cute dogs.jpg"

Ignoring the fact commons probably doesn't need yet another pic of someone's puppies, the period denotes the esp is an abbreviation for especially. Under this proposal would you like us to
A) prevent the file being uploaded
B) Auto rename it to esp.jpg
C) Magically recognize the ". cute dogs" is not an extension, and let it through.
Comment 4 johnnymrninja 2012-09-24 17:32:25 UTC
(In reply to comment #2)
> This is fantastic. I recommend that you use the shortest form in all lowercase
> as the chosen extension (i.e. ".jpg" instead of ".jpeg" or ".JPG". This is
> because .jpg is the most common variant for jpegs by a great deal, and .tif is
> the most common varient for tiffs by something of a large-ish margin.
> 
> My one concern is the handling of .ogg and .ogv. These two can /occasionally/
> but not always be used interchangeably, or at the very least, have been. We
> can't eliminate either, but we might (I lack the technical knowledge to tell
> for certain) run into problems with this.
> 
> Thanks for doing this,
> Sven

.ogg is used generically for the container format, but .ogv is designed solely
for OGG video, and .oga is solely for OGG audio. As they have separate MIME
types, there shouldn't be an issue.

The main source of conflation is that OGG audio codec is called "OGG Vorbis",
so some people assume that the extension .ogv is for that (I know I did).

Worst case, if there is some issue with OGG, or people are super-attached to
the generic extension, the MIME type can be left alone for now.

The vast majority of uploads are pictures, and I'd rather see only those issues
resolved than none at all.
Comment 5 johnnymrninja 2012-09-24 17:36:04 UTC
(In reply to comment #3)
> Say someone uploads a file named: "esp. cute dogs.jpg"
> 
> Ignoring the fact commons probably doesn't need yet another pic of someone's
> puppies, the period denotes the esp is an abbreviation for especially. Under
> this proposal would you like us to
> A) prevent the file being uploaded
> B) Auto rename it to esp.jpg
> C) Magically recognize the ". cute dogs" is not an extension, and let it
> through.

The software already knows which extensions belong to which MIME types, it's not magic. As ". cute dogs" is not an extension, there would be no issue. There is no reason to attack every period, only known extensions.

Even unknown extensions should be safe, as long as their MIME type is equally unknown. If the MIME type is known, it's appended. So if a JPEG is uploaded as "esp. cute dogs.dog", it would become "esp. cute dogs.dog.jpg", and the uploaded is asked if they wish to continue.
Comment 6 johnnymrninja 2012-09-24 17:42:20 UTC
To be absolutely clear, this should only relate to extensions at the end of the
file. So "exe.gif.png.jpg" would be a fine name for a JPEG, if bizarre.
Comment 7 Jarek Tuszynski 2012-09-26 13:48:44 UTC
Two Comments:
1) a Commons source of extension MIME type mismatch is the reupload feature. For example http://commons.wikimedia.org/wiki/File:Grb-Pozarevac.jpg was uploded as jpg and than someone reupload a gif over it. I guess reupload should not allow use of other MIME types and offer to upload it under a new name.
2) See http://commons.wikimedia.org/wiki/User:Dispenser/sandbox for examples of 1,625 other files with extension mismatch found on Commons.
Comment 8 Platonides 2012-09-26 13:54:13 UTC
Jarek, Commons currently blocks you from uploading most files with a wrong mime type.
Comment 9 Jarek Tuszynski 2012-09-26 14:02:00 UTC
But it does not block me from uploading (or reuploding) MIME:JPG file with .PNG extension, like http://commons.wikimedia.org/wiki/File:TPR2011.png uploaded this March.
Comment 10 Marco 2012-09-26 14:38:27 UTC
(In reply to comment #9)
I can't reproduce this behavior.
Comment 11 Jarek Tuszynski 2012-09-26 14:55:54 UTC
I just tried and I can not reproduce it either. I tried new upload with extension mismatch and reupload. I guess someone fixed it since March when http://commons.wikimedia.org/wiki/File:TPR2011.png was uploaded. Status: Fixed?
Comment 12 Waldir 2012-09-26 15:01:35 UTC
(In reply to comment #11)
> Status: Fixed?

I think what is fixed is the reupload conflicts, not this bug which deals with first-time upload.
Comment 13 johnnymrninja 2013-04-01 08:19:43 UTC
Just to summarize (got a bit off-track up there):

1.We would maintain a list of accepted mime types and their preferred file extension.
2.Files would automatically receive an extension based on their mime type.
3.Files that are uploaded with known extensions that do no match would be renamed after a prompt ("Renaming to 'Dog.gif'. Do you wish to continue?")
4.File names would not be otherwise modified. If a file is named "dog.gif.png" and it is a JPEG, it would be renamed "dog.gif.jpg". If it was named "dog.gif.cat", it would be uploaded as "dog.gif.cat.jpg".

For the purposes of this bug, the only things that would have to be modified are the file uploader, and file renaming/moving. This would not change how files are displayed or used, or even the nature of the filename. File redirects could still be manually created at these other extensions. It would just reduce the options at the time of upload, and potentially make other bugs easier to fix in the future.

Is there anyone willing to theorize on how doable this is as a bug?
Comment 14 Marco 2013-08-13 18:09:46 UTC
I think this can be closed as RESOLVED-DUPLICATE of bug 40326
Comment 15 Platonides 2014-05-11 18:33:46 UTC
bug 40326 seems a different bug. Comment 13 summary seems correct but I would only change the uploader. Renaming is a more manual process, and I am sure there will be cases where there's a desire to override that detection.
Comment 16 Adam Cuerden 2014-07-22 00:53:44 UTC
My only suggestion here is that, if a filename has a different MIME type to its suggested extension, surely that should be enough for an "are you sure?" prompt first, as it might well be that the uploader is uploading the wrong file.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links