Last modified: 2014-11-17 10:35:34 UTC
Currently all images include an extension that specifies the format of the image (such as .jpg, .png, .gif, .svg, etc.) Ideally the image name should not include this information, since it doesn't matter to those who *use* the image whether it's a JPEG or a PNG. For example, it would be much better to be able to say [[Image:Map of Europe]] than to have to say [[Image:Map of Europe.png]]. The author of the article shouldn't have to know (or care) what format the image is in. Additionally, since images currently can't be moved or renamed (bug 709), if a new version of the image is uploaded which is in a different format, it must be uploaded under a different image name. Then all the pages that use the image have to be changed, and the history of the old image is lost. This is a lot of unnecessary hassle. Fixing this can also standardize lower-case image extensions, so that we don't have images named Bleck.PNG or Mleko.JPG. The only major problem I see with fixing this is in the conversion process, when something has to be done about images whose names are the same except for the extension.
This is probally a really stupid idea, but... Whats wrong with just making the extention meaningless. that way a png can be loaded over a gif. the format would be different, the actual file would be foo.gif.png (for old browsers that listen to extentions more then mime types), but the link would still be [[image:foo.gif]] . You could upload something without an extention as well like [[image:bob]] the file on the server would still be upload.wikimedia.org/.../bob.png . on the image description page you could tell what the image really is by the "(4KB, MIME type: image/png)" line. Just a stupid, flawed idea I thought i'd throw out.
That's more or less what I was thinking, except that if the extension is meaningless there's no reason to have it at all (and in fact there would be a pretty good reason not to have it). But the implementation would be like what you described. Suppose we upload an image with the name "Snark". Then we can refer to it in an article as [[Image:Snark]]. If it's a PNG, the actual URL will be http:// upload.wikimedia.org/.../Snark.png perhaps, or if it's a JPEG it might be http://upload.wikimedia.org/.../Snark.jpg.
If you really want to be idealistic, read http://www.w3.org/Provider/Style/URI ("Cool URIs don't change"), where Tim Berners-Lee makes some pretty good points against including the file type in the URL at all (so no .png extensions). But that would probably be a lot of work for very little benefit, and it might even confuse some browsers to have a URL of a PNG image that didn't end in .png.
I furiously disagree with this line of reasoning. The encoding (not format) of an image is in fact a critical part of the information an end-user needs to deal with the image; the WMF exploit currently in the wild is an excellent example of why. Furthermore, it's common practice to keep files with overlapping names but for the extensions, and the method currently in use is several years entrenched; this would require a massive amount of upkeep work for essentially no significant gain. My personal wiki holds more than 800 images; the result for Wikipedia would be nothing short of a disaster. Old engineers know one thing above all else: don't change anything unless there's a good reason. Why do you want to strip the extension? What good does it do you, for the several kinds of spoofing attacks it may end up enabling? Since when do wiki readers get bothered if the name of a file which they never see has the same kind of extension that essentially every other modern platform in existence has? If this was being proposed at the beginning of a project, I'd pan it as silly. Since this is being done several years into content provision, I call it outright counterproductive, and arguably dangerous. Red flag. Do *not* do this until there's a damned good argument made.
Okay, well, the main reason I proposed it is that it basically prevents a replacement image in a different format from being uploaded. If there's a JPEG image, and I want to replace it with a PNG, I have to upload it as a new image, replace all uses of the old JPEG with the new PNG, delete the old JPEG, and I lose all the history of the old image. Maybe if bug 709 is fixed correctly, this problem will go away.
This could also maybe be addressed by simply not *requiring* extensions. Any wikis that are currently set up could then choose to continue to use extensions if they wish, or slowly migrate their images to non-extensioned names.
Also, I don't understand what kind of spoofing attacks it would enable. The actual URL of the image could still end with the correct extension; the only thing that would be changed is the image name that's displayed at the top of the image description page, and the name used to include the image in an article.
I fervently support this proposal. I have a list on Commons of thousands of PNGs that need to be converted to SVGs. The work that it will take to update all the references is monumental. Effective image renaming would also solve the problem, but the W3 is right that image URLs that don't change are really a better idea - it just exposes implementation in an unnecessary way, even if it is unconventional to do otherwise.
I support this proposal.
Supported
*** Bug 4878 has been marked as a duplicate of this bug. ***
If anything, the extensions should be case insensitive. Trying [[Image:Foo.png]] when the actual file name is [[Image:Foo.PNG]] is frustrating, and there's no real loss from that conversion, IMO.
*** Bug 6337 has been marked as a duplicate of this bug. ***
*** Bug 6451 has been marked as a duplicate of this bug. ***
Coppied from bug #6451 ---------------------------------------------------------------------- I feel it would be logical and prudent for any person uploading to to give it a uneque description. Multiple files with identical filenames with identical extensions are problematic (example: Example.png and Example.pNg). Mediawiki currently allows the upload of files with identical filenames and extension with only diference being case diference. I see no use of having Example.png and its diferent variants (case diferences) aside from inflicting confusion. I was also informed that there had been issues of Wikia logos being the wrong case. There is the matter of existing images requiring a rename... but that can be dealt with later. Firtly we should prevent any newer images with wierd cases.
I support this - at least the case-insensitiveness to prevent name conflicts/confusion when inserting wikilinks would be good.
I support this; metadata shouldn't be part of the filename. This would also make it easier to replace images with a different format without having to change all links to them. (Currently users uploading images in inappropriate formats is a big problem on Wikipedia and Commons, as evidenced by the number of {{badJPEG}} and {{SVG}}-tagged images.)
This is silly. [[image:foo.jpg]], [[image:foo.jpeg]] are different. Implementing this would create a new bug where if there are multiple files with the same name. It's not backwards compatible and would cause more trouble than it's worth.
I don't think anyone is suggesting we strip the existing filenames off. That may happen but it would have to be a manual process. And there is a problem. When a png is used on over 100 pages across 12 wikis on the commons and it changes to a better svg how do you update it? The toolserver is down remember. So, currently you edit the links for maybe an hour, and miss the ones on english, which (hopefully) someone notices and fixes themselves.
This bug only suggests to remove the *requirement* to add the image extension as a part of the name, and maybe remove this extension from new files by default - not to rename the current files. If Bug 709 will be fixed, we will be able to rename images manually, but images will not be automatically renamed, and there will be no new problem. Reopening.
Respond to "Duncan Harris" Tell me ONE good reason why we want confusing filenames/extensions such as foo.jpg and foo.jpeg. We do not want files with names like Paris.jpg Paris.jPg Paris.jpeg The intention is to make itso it is not backwards compatible since the point is resolving the problem not forking it. As you point out the two files in question are different and both of them should have had been given a more descriptive filename to avoid a conflict in the first place. The upload should not have been allowed and given an error to pick a better name. There was a check to see how many images were conflicting and the number was a thousand something some time ago. All of those 1000+ images need to be renamed to something that is actually descriptive. The point is to separate the image description page and the actual image allowing moves and other manipulations. It is a complete waste of system resources and peoples time to move images as we are doing now.
I strongly support this request. The "image description pages" have fundamentally wrong URLs. For example, I use a little add-on in firefox that pops up as soon as I want to surf to a pdf-File, asking me whether to do with that file (save, view). It works just fine everywhere but in commons, where a non-PDF-file has pdf as its ending: the description page. If you want to link to the Commons from outside Wikimedia (say, from a blog), people see a URL ending in JPG and might try direct download ("right-klick" and "save as"). Thus, they will download a description page. This whole behaviour of Commons is different from what users would rightly expect from the URL.
I furiously agree with this proposal. The encoding of an image is completely irrelevant to the way it is used in an article and the way the image's description page is accessed. The only place it is important is when fed to an end-user, where the extension would, of course, still be used. For the title of the image description page and inclusion in articles, the extension should not be present at all. This is a perfect example of the problems it would help solve: http://commons.wikimedia.org/wiki/Commons_talk:Deletion_requests/Superseded If a PNG image is updated by uploading another PNG image with the same name, the older version is kept in the "image history" on the image description page for licensing and practical reasons. (Sometimes the replacement is inferior and is reverted back to the old version, for instance.) If an SVG image is uploaded to replace the PNG, however, it must exist at a new name, breaking the link to the old, and inspiring people to delete the old PNG so that it will not continue to be used. Images in the "image history" cannot be used in articles, but images that exist at other names still can be. "I support this; metadata shouldn't be part of the filename." It shouldn't even *be* a "filename". It should be an "image title". "This is silly. [[image:foo.jpg]], [[image:foo.jpeg]] are different." That's why we need Bug 709 (Cannot rename/move images and other media files.) In the meantime, the .jpeg will continue to be part of the image's title to keep them distinct. But in the future, it should be possible to rename, redirect, and merge images just like articles. Then bots can strip the extensions from image titles that don't conflict, and flag the rest for human disambiguation. "I strongly support this request. The "image description pages" have fundamentally wrong URLs." Another good reason. The image description page has a URL that looks like an image, but is actually a web page.
We should store the MIME type as we do now, discard the image extension on upload (or at least remove it by default), and generate an extension automatically for actual media links and thumbnails based on the stored MIME type. I can't imagine any code depending on images ending with a suffix, given that they're namespaced and all, so I don't think this would be excessively difficult to fix, either. Existing images should stay where they are until we have image redirects, at which point they can be mass-moved to extensionless forms (with conflicts being ignored and left for manual cleanup). In the meantime, people who are deleting images merely for being obsolete under the assumption that they aren't being used on non-Wikimedia parts of the Internet should be smacked.
*** Bug 19874 has been marked as a duplicate of this bug. ***
Clarified summary, added dep for bug 20971.
I started looking at this problem to see if I could work up the first baby step patch toward allowing arbitrary file names, which is to allow moving an image to an arbitrary location. I first just tried removing checkExtensionCompatibility from Title::isValidMoveOperation(). That mostly seems to work, but the hitch comes when you try to reference the bare image URL, because that's served directly from the webserver, which typically relies on the file extension to get the media type. So, regardless of how MediaWiki refers to the file, it still needs to be stuffed onto disk with a valid extension intact unless you're hosting on some ninja psychic webserver that just knows what the media type of the file is (or perhaps one that's cracking open files to see what's in them). The strategy I cooked up next was to modify FileRepo::getNameFromTitle(). Here was the plan: * Check if the media type from the database matches up with the media type derived from the file name from the title * If they match, do nothing different * If they don't match, tack on the extension to the filename based on the media type I'm not even remotely sure if getNameFromTitle is the right place to insert this sort of thing (I suspect it isn't, actually). I was looking for a place where I could muck with the file name without also mucking with other uses of the name. It seems safer to go elsewhere, like File::getRel and File::getUrlRel. I may play around more with this once I get unstuck. The wall I hit was figuring out the right way to pull the MIME type out of the database, because I think I accidently engaged in a mutually recursive death spiral calling LocalFile::GetMimeType() from there. If anyone has any tips on the right way to pull the mime type for a given Title out, that'd be most helpful. I may putter around with this a bit more this weekend. This isn't a problem I'm planning to bulldog until it's fixed, but it's something I'll post a patch to if I manage to muddle my way into something that seems to work. One hacky workaround that could work ok is to allow arbitrary file names in the upload screen, but instead of rejecting the upload if the name doesn't match a valid MIME type, simply tack the extension onto the end, then put a redirect from the given name to the extended name. That's probably a little too hacky to have the fully desired effect, but it does at least make it a little easier to refer to the file in an extension-agnostic manner.
(In reply to comment #27) <snip> > I'm not even remotely sure if getNameFromTitle is the right place to insert > this > sort of thing (I suspect it isn't, actually). I was looking for a place where > I could muck with the file name without also mucking with other uses of the > name. > It seems safer to go elsewhere, like File::getRel and File::getUrlRel. I may > play around more with this once I get unstuck. I agree getNameFromTitle is bad. I'd suggest it would be better to add a new accessor File::getNameOnDisk as an alternative to File::getName, and then change the URL constructors to use the former rather than the latter. > The wall I hit was figuring out the right way to pull the MIME type out of the > database, because I think I accidently engaged in a mutually recursive death > spiral calling LocalFile::GetMimeType() from there. If anyone has any tips on > the right way to pull the mime type for a given Title out, that'd be most > helpful. <snip> Assuming your goal was to use MIME type to determine the appropriate extension, then that won't work. Because we have allowed capitalization variations, e.g. Foo.JPG != Foo.jpg != Foo.jpeg, there is no way to uniquely determine what the extension should be from the type. Almost certainly we will need to add an extension field to the Image and Oldimage tables and simply look up the extension. An advantage of this is that one could set all existing files to have a null extension, meaning that nothing needs to be added to the file name as already exists.
(In reply to comment #27) > I first just tried removing checkExtensionCompatibility from > Title::isValidMoveOperation(). That mostly seems to work, but the hitch comes > when you try to reference the bare image URL, because that's served directly > from the webserver, which typically relies on the file extension to get the > media type. So, regardless of how MediaWiki refers to the file, it still needs > to be stuffed onto disk with a valid extension intact unless you're hosting on > some ninja psychic webserver that just knows what the media type of the file is > (or perhaps one that's cracking open files to see what's in them). The simplest way to handle this from our perspective is to just give all the on-disk files a name ending in, say, .png. This will typically cause an incorrect Content-Type to be served -- except for PNG files, of course -- but browsers will display the pictures fine anyway, as long as it's served as some recognized image type. See <http://tools.ietf.org/html/draft-abarth-mime-sniff-03>. In fact, it should work fine in many cases even if a non-image MIME type is served. Arguably, relying on this MIME type sniffing is incorrect and confusing. But it's a possibility, for simplicity's sake. It's certainly reliable.
(In reply to comment #29) > The simplest way to handle this from our perspective is to just give all the > on-disk files a name ending in, say, .png. This will typically cause an > incorrect Content-Type to be served -- except for PNG files, of course -- but > browsers will display the pictures fine anyway, as long as it's served as some > recognized image type. See > <http://tools.ietf.org/html/draft-abarth-mime-sniff-03>. In fact, it should > work fine in many cases even if a non-image MIME type is served. > > Arguably, relying on this MIME type sniffing is incorrect and confusing. But > it's a possibility, for simplicity's sake. It's certainly reliable. Though I haven't tested it systematically, I'll assume that some fraction of browsers will happily process some fraction of file types without extensions. However this feels like a terrible hack, and I worry that not enough browsers would process enough file types. In particular, if we want our approach to work for Mediawiki installs in general, then we can't simply assume that we are only talking about image files. Does it work for PDFs, for Word Documents, for spreadsheets, etc.? Also if the file has a .png extension and a person saves it to their local hard drive, then I strongly suspect that Windows users will have a hard time reading the file in most apps without manually changing the extension (not sure about Mac / Linux). I think it makes much more sense to provide the user with an appropriate extension, even if that information is unnecessary in some cases.
(In reply to comment #30) > Though I haven't tested it systematically, I'll assume that some fraction of > browsers will happily process some fraction of file types without extensions. > However this feels like a terrible hack, and I worry that not enough browsers > would process enough file types. > > In particular, if we want our approach to work for Mediawiki installs in > general, then we can't simply assume that we are only talking about image > files. Does it work for PDFs, for Word Documents, for spreadsheets, etc.? > Also if the file has a .png extension and a person saves it to their local hard > drive, then I strongly suspect that Windows users will have a hard time reading > the file in most apps without manually changing the extension (not sure about > Mac / Linux). > > I think it makes much more sense to provide the user with an appropriate > extension, even if that information is unnecessary in some cases. Reasonable points, especially about things like PDFs and saving files locally. I retract the suggestion.
Created attachment 6680 [details] Incomplete attempt to allow image moves to arbitrary names Patch attached for an initial incomplete implementation. It seems to work with my very limited testing. Note: I added a new config setting ($wgCheckFileExtensions) which needs to be set to "false" in order to use this (default is "true"). In reply to comment #28 (great feedback, btw): > I agree getNameFromTitle is bad. I'd suggest it would be better to add a new > accessor File::getNameOnDisk as an alternative to File::getName, and then > change the URL constructors to use the former rather than the latter. Okee doke. I added File::getFilename, and changed a few calls to point to that. I had to add a corresponding FileRepo::getFilenameFromTitle. > Assuming your goal was to use MIME type to determine the appropriate extension, > then that won't work. Because we have allowed capitalization variations, e.g. > Foo.JPG != Foo.jpg != Foo.jpeg, there is no way to uniquely determine what the > extension should be from the type. It's possible, and the attached patch does it in a pretty reasonable way (adding a new "getPreferredExtensionForType" that leverages some existing normalization code). However, I concur that this isn't the best solution. The problem with it is that an innocent reconfiguration could render the files inaccessible. > Almost certainly we will need to add an > extension field to the Image and Oldimage tables and simply look up the > extension. An advantage of this is that one could set all existing files to > have a null extension, meaning that nothing needs to be added to the file name > as already exists. I ran out of time before I could implement this, but that would seem to be the next logical step in all of this. I still think we'll need the logic for generating an extension from a MIME type, in the event that the initial uploaded file name doesn't match the MIME type we'll need to get an extension from somewhere.
Created attachment 6859 [details] v2 patch - handles upload, still buggy New version of a patch. Still some testing, known bugs and cleanup to do, but looking for last minute feedback before I finish this off. The big change is that there's a new field in the image table (img_file_ext) along with corresponding changes in oldimage and filearchive. It appears as though the check on upload was already mostly coded, and there was even a $wgCheckFileExtensions variable that I didn't notice in my first version (looks like its an antique, too) Known bug: uploading an image without an extension will cause the DB to end up in incorrect state. Here's my test plan: * Image renaming: ** Upload Foo.jpg ** Rename Foo.jpg to Foo ** Rename Foo to Foo.jpeg ** Rename Foo.jpeg to Foo.gif ** Upload Bar (GIF file) ** Rename Bar to Bar.gif * Set $wgSaveDeletedFiles=true * Set $wgFileStore['deleted']['directory'] to valid directory * Delete, then undelete an image * Upload a new version of an image ** With no extension ** with proper extension * Change configuration of default extension from "jpg" to "jpeg". Deal with images from before transition * Install MW 1.15, set wgCheckFileExtensions=false, upload images (with/without matching extensions) then upgrade to new version and check images * Fresh install of MediaWiki uploading both images with/without matching extension in title
Created attachment 6885 [details] bug4421-robla-v3-svn59811.patch It's mostly working, though working through all of the edge cases is a bit of a game of whack-a-mole. Most of the complexity comes from needing to store the files on the filesystem with appropriate file extensions, since these get served directly from the filesystem from Apache. Thus, there's a lot of convoluted logic for tacking on the file extension in the appropriate spots. An example of something that isn't working that I need advice on is this: with my modified version, it's possible to upload a jpeg to a location without an extension, then upload a png to that same location. The problem comes in LocalFile::publish(). Here's the call it makes from that function: $status = $this->repo->publish( $srcPath, $dstRel, $archiveRel, $flags ); This causes two things to happen: 1. copy $dstRel to $archiveRel 2. copy $srcPath to $dstRel. The problem here is with uploading a png over the top of a jpg. For example, if the title name is "File:Foo", then the filename for the first version of the file will be "Foo.jpg", and the replacement will be "Foo.png". So, if we pass "Foo.jpg" to $dstRel, then step 1 works, but step 2 fails. If we pass "Foo.png", then the opposite problem occurs. Thoughts on dealing with this problem? It would seem that modifying FileRepo::publish() (or adding a new method with more parameters) seems like the only solution here.
Created attachment 6926 [details] bug4421-robla-v4all-svn60601.patch Yet another version of the patch. Still needs testing, but otherwise I think this one is ready for primetime.
Created attachment 6927 [details] bug4421-robla-v4staged-svn60601.tar.gz bug4421-robla-v4staged-svn60601.tar.gz is a tarball containing the same patch as bug4421-robla-v4all-svn60601.patch, only broken up into several stages worth of patches. I broke it up both in hopes that it might be easier to digest in smaller parts, and as a way to review my own code.
I've now tested this as much as I'm going to now. Anyone care to try this out?
Such a major change to the file repo code needs a review by Tim for security, scalability, etc.
I'm not really interested in making major changes to the trunk at the moment, due to the need to stabilise for a 1.16 release branch. But feel free to commit it to a development branch.
(In reply to comment #39) > I'm not really interested in making major changes to the trunk at the moment, > due to the need to stabilise for a 1.16 release branch. But feel free to commit > it to a development branch. > Granted, and I certainly don't suggest trying to get this in before 1.16 branches and releases, just a general note that I'd like a thorough review before this does (eventually) go into trunk :)
The individual patches are checked into a branch now: http://www.mediawiki.org/w/index.php?title=Special:Code/MediaWiki/path&path=/branches/extensionless-files/ I checked in the important patches (stage 1 through stage 3) first, then the optional ones after that (and then the one file I forgot to add...oops). svn revs 60770-60773 and 60779 are the important ones, svn revs 60774-60778 are minutia that can evaluated independently.
I am the owner of the bug 20971, which is closely linked to this. Could someone kindly inform me about the progress here? Thank you
Hi Mattia, the code is still sitting in the extensionless-files branch. I can conceivably take a crack at bringing it up-to-date with the trunk and merging it in. However, it won't make it into 1.16, and it requires a database upgrade, so it probably needs more review than its gotten so far. If you're eager to accelerate progress on this, my recommendation would be to raise this on the mediawiki-l mailing list, making a case for why this is needed sooner rather than later. In the meantime, I'll work on getting the easier portions of this patch incorporated into trunk, so as to hopefully make it easier to incorporate the rest when the time comes.
(In reply to comment #43) > If you're eager to accelerate progress on this, my recommendation would be to > raise this on the mediawiki-l mailing list, making a case for why this is > needed sooner rather than later. > wikitech-l would probably be more appropriate.
I'm not so sure about this. Personally, my scripts often rely on the assumption that image.img_name is the same as page.page_title when page_namespace = 6. I use this assumption to generate reports of files without file description pages, file description pages without files, and comparing enwiki_p.page.page_title to commonswiki_p.image.img_name. I imagine there are other scripts on the Toolserver and elsewhere that rely on a similar assumption. This change would likely break these scripts. I'm also concerned about naming conflicts. In bug 20971#c4, Brion suggests that only non-conflicted image names would be stripped. Deliberate inconsistency here doesn't seem like an ideal situation for editors or anyone else. Though I suppose page text will be inconsistent for the rest of time if this change is implemented anyway. On an emotional level, stripping the file extensions feels wrong. A JPG simply isn't the same as a GIF or a PNG or an SVG. Even users who are only adding the file inclusion code to pages need to understand and appreciate that.
Your concerns are, IMO, trivial compared to the troubles we have from including extensions thus far. You're right that JPEG, GIF, PNG, & SVG simply are not the same, which is just another reason why they don't need filename extensions anywhere, and certainly not at their File:Name locales.
(In reply to comment #45) > stripping the file extensions feels wrong. A JPG simply > isn't the same as a GIF or a PNG or an SVG. Even users who are only adding the > file inclusion code to pages need to understand and appreciate that. I agree completely here. We often upload different formats of the same image for differing purposes, and change only the file extension. The reason for that is it *does* matter which one you use! (In reply to comment #5) > Okay, well, the main reason I proposed it is that it basically prevents a > replacement image in a different format from being uploaded. That's a good thing. The replacement image with a different format is a different image. Any good Commoner knows that an image of a given format can never be superseded by an image in another format on that basis alone. This is a feature, not a bug.
I don't think any of the objections from comment 45 or comment 47 are very compelling. But I'm wondering what happens with non-image files. Should it be impossible to tell videos from images from PDFs based on the names? You would have no idea from looking at the wikitext source whether [[File:Foo]] is including image or video or audio or maybe something else entirely.
(In reply to comment #46) > the troubles we have from including extensions thus far. Remind me what those troubles are? I cannot think of a single one, while I can think of contraindications.
Design doc posted here: http://www.mediawiki.org/wiki/Requests_for_comment/Extensionless_files
"The file extension is stored in a new 'img_file_ext' field in the 'image' table (and similar fields to oldimage and filearchive). This field defaults to null. When it is set to null, the file name and the page title are the same." Could we instead just key off the MIME type here? That seems simpler and less redundant. What would img_file_ext='gif' but img_minor_mime='jpeg' mean? That's denormalized.
Hi Aryeh: the reason I chose to store the file extension in addition to MIME is that both img_file_ext='jpg' or 'jpeg' are both valid values when img_minor_mime is 'jpeg'. While one might be able to infer what the extension would be based on the preferred extension given the MIME type, it's potentially a booby trap for devs and sysadmins down the road, who might unintentionally corrupt a wiki by changing the preferred file extension from one to the other. What may seem like a harmless switch from "jpg" to "jpeg" as the preferred extension would suddenly cause a lot of existing images, archive images, and thumbnails to break. By storing this in the DB, changing the preferred extension in the configuration/code is safe, with only future updates taking on the new preferred extension.
(In reply to comment #48) > I don't think any of the objections from comment 45 or comment 47 are very > compelling. I disagree, I think they are rather convincing, and I pretty much agree with everything Mike.lifeguard and MZMcBride have said. > But I'm wondering what happens with non-image files. Should it be > impossible to tell videos from images from PDFs based on the names? You would > have no idea from looking at the wikitext source whether [[File:Foo]] is > including image or video or audio or maybe something else entirely. I think that's a rather compelling argument against it right there. (In reply to comment #46) > You're right that JPEG, GIF, PNG, & SVG simply are not the same, which is just > another reason why they don't need filename extensions anywhere, and certainly > not at their File:Name locales. That doesn't make any sense. I want to know what type of media I'm using. The extension helps convey that. I do see the annoyances of JPG vs jpg, but that could very be fixed by normalizing extensions to lowercase on upload, regardless of what happens here. I'm still not convinced of the overall usefulness of this though.
> But I'm wondering what happens with non-image files. Should it be > impossible to tell videos from images from PDFs based on the names? You would > have no idea from looking at the wikitext source whether [[File:Foo]] is > including image or video or audio or maybe something else entirely. How does one know the difference between a GIF and an animated GIF based on file extension? How does one know the difference between a Flash file (.swf) that just has static vector art versus video? How does one know the difference between a static .svg and one that includes a <video> element? The arguments in the "URI Opacity" section of the W3C's Architecture group apply to this conversation too: http://www.w3.org/TR/webarch/#uri-opacity
(In reply to comment #54) > How does one know the difference between a GIF and an animated GIF based on > file extension? How does one know the difference between a Flash file (.swf) > that just has static vector art versus video? How does one know the difference > between a static .svg and one that includes a <video> element? > Sure, a file's extension is not a magic bullet that unambiguously tells you everything you want to know about a file. But it *helps* a great deal.
Helps with what? Realizing you have to upload a replacement image to another name because this bug isn't closed? For what other reason would it matter what format the file is? (even though you'd be able to tell regardless)
So what would this do for cases like http://commons.wikimedia.org/wiki/File:Banana.JPG and http://commons.wikimedia.org/wiki/File:Banana.png? The two are of completely different images.
> So what would this do for cases like > http://commons.wikimedia.org/wiki/File:Banana.JPG and > http://commons.wikimedia.org/wiki/File:Banana.png? The two are of completely > different images. Since they have two different page titles, they'd be treated as two different images. For that matter, http://commons.wikimedia.org/wiki/File:Banana.jpeg and http://commons.wikimedia.org/wiki/File:Banana.jpg would still be treated as two different images. The only thing this feature does (if enabled) is *allow* for the creation of "http://commons.wikimedia.org/wiki/File:Banana", and decouple the MIME type from the page title extension. It does not automatically strip off the extension from existing page titles or create automatic redirects of any sort. The parenthetical "(if enabled)" bit is important here, too. There's nothing forcing anyone (including Wikimedia Foundation) to actually use this feature just by virtue of MediaWiki supporting the functionality.
(In reply to comment #56) > Helps with what? Realizing you have to upload a replacement image to another > name because this bug isn't closed? For what other reason would it matter what > format the file is? (even though you'd be able to tell regardless) You still haven't explained how we're supposed to know what [[File:Name]] is when looking at the syntax. (In reply to comment #58) > The only thing this feature does (if enabled) is *allow* for the creation of > "http://commons.wikimedia.org/wiki/File:Banana", and decouple the MIME type > from the page title extension. It does not automatically strip off the > extension from existing page titles or create automatic redirects of any sort. > The question is not "will it break existing images," it's that when you strip the extension, the name becomes meaningless. If I'm trying to include a picture of a Banana, I want a JPG or PNG, not a MPG. What type of media am I using here [[File:Banana]]? By keeping the extension, it's not (as) ambiguous. Like Roan said above, it's not a magic bullet, but it certainly helps. > The parenthetical "(if enabled)" bit is important here, too. There's nothing > forcing anyone (including Wikimedia Foundation) to actually use this feature > just by virtue of MediaWiki supporting the functionality. Yes, but if it's not a good feature (which we seem to disagree on), we shouldn't support it at all. If we implemented every idea someone had, we'd have a lot less WONTFIXes. I think the outstanding questions need answering, before this moves forward any more.
> You still haven't explained how we're supposed to know what [[File:Name]] is > when looking at the syntax. You're not, as explained here: http://www.w3.org/TR/webarch/#uri-opacity
(In reply to comment #60) > > You still haven't explained how we're supposed to know what [[File:Name]] is > > when looking at the syntax. > > You're not, as explained here: > http://www.w3.org/TR/webarch/#uri-opacity Agent's aren't supposed to infer anything. From the spec: > The example URI used in the travel scenario ("http://weather.example.com > /oaxaca") suggests to a human reader that the identified resource has something > to do with the weather in Oaxaca. Of course it's not guaranteed to be correct (as the spec goes on to say), but it certainly does help. This is about human readability, not whether the file extension really matters.
> Of course it's not guaranteed to be correct (as the spec goes on to say), but > it certainly does help. This is about human readability, not whether the file > extension really matters. If that's the primary concern, then the right thing to do is to set up "Image:", "Audio:" and "Video:" namespaces to distinguish between different file types, rather than lumping them all in to "File:". Expecting non-technical users to understand that ".svg" usually means a vector diagram hardly serves the goal of readability.
(In reply to comment #62) > > Of course it's not guaranteed to be correct (as the spec goes on to say), but > > it certainly does help. This is about human readability, not whether the file > > extension really matters. > > If that's the primary concern, then the right thing to do is to set up > "Image:", "Audio:" and "Video:" namespaces to distinguish between different > file types, rather than lumping them all in to "File:". Expecting > non-technical users to understand that ".svg" usually means a vector diagram > hardly serves the goal of readability. There may be a case for adding "Audio" and "Video" prefixes as aliases for "File", though it would probably cause conflicts with a fair number of installations that have already created separate namespaces with these prefixes. Implementing this feature (extensionless files) as a configurable option with the default off might be an option, though the required schema changes make it unlikely that many people would utilize it, I think. In general, it seems like removing the extensions causes far more problems than it solves.
Okay, so: 1) Problem with the current system: Cannot upload a new version of a file in a different format while preserving history. 2) Problem with the current system (not mentioned for a while): Google apparently doesn't index image pages properly on non-Wikimedia MW installs, because it assumes anything ending in .png/.jpeg/etc. is an image page, not an HTML page. 3) Problem with the proposed system: Files are possible that have no extension, or a completely misleading extension, so it's not clear what general type of file they are (although sometimes this is unclear anyway). There are several possible solutions I can think of. The status quo solves (3) but not (1) or (2). The proposal solves (1) and (2) but not (3). I don't see any reason why we wouldn't want to allow the proposed changes as an option; some wiki admins will surely prefer the option, although others may not. It could be disabled by default. Another possibility is to require extensions as now, but allow upload of a new file to an existing filename of a different type. This would automatically rename the file to the new appropriate extension, and would only work if that's possible. Reverting to an earlier file of a different type would also change the name. This solves (1) and (3) but not (2). It would be a bit messy, but I think strictly better than the status quo. (In reply to comment #52) > Hi Aryeh: the reason I chose to store the file extension in addition to MIME is > that both img_file_ext='jpg' or 'jpeg' are both valid values when > img_minor_mime is 'jpeg'. While one might be able to infer what the extension > would be based on the preferred extension given the MIME type, it's potentially > a booby trap for devs and sysadmins down the road, who might unintentionally > corrupt a wiki by changing the preferred file extension from one to the other. > What may seem like a harmless switch from "jpg" to "jpeg" as the preferred > extension would suddenly cause a lot of existing images, archive images, and > thumbnails to break. By storing this in the DB, changing the preferred > extension in the configuration/code is safe, with only future updates taking on > the new preferred extension. Why not just hardcode "jpg" as the preferred version, and never change it? That seems a lot simpler and less error-prone than keeping track of it.
> Why not just hardcode "jpg" as the preferred version, and never change it? > That seems a lot simpler and less error-prone than keeping track of it There would need to be all sorts of red flags and warnings around the part of the configuration/code that specifies that mapping, and if there's ever a legitimate need to remap any extension, fixing it becomes pretty fragile. The current mapping of image/jpeg->".jpeg" as preferred extension is in the mime.types file, which looks roughly compatible with the Apache mime.types file. Someone may naively copy an Apache file over and screw up their wiki if the ordering isn't the same. Mind you, it's not just JPEG that has multiple choices for filename, it's most media types. Changing it on an existing wiki seems like it'd really screw things up, and it's pretty easy to imagine someone trying it. That said, I'm not dug in on this approach. I can definitely see the benefit of not touching the database; in fact, it was my original strategy. Part of the reason why I went with the database approach was the recommendation in comment #28, the wisdom of which was borne out after I spent a fair amount of time trying to make the no-database-changes approach work. I understand the code better now, so I'd probably be more successful if I tried again - though I'm a little nervous I might just rediscover another reason why the database change was needed. As I recall, I think what tipped me over was taking a good look at how mime.types are configured. Regardless, the job of trying a different approach would be made easier by getting some variant of r60772 checked in (as well as some of the other fixes and tweaks on that branch), since I'm a little worried that there's more code that's being checked in that glibly assumes article title==filename. The sooner those bits are checked in, the easier it would be to maintain a branch that implements the actual feature.
(In reply to comment #65) > There would need to be all sorts of red flags and warnings around the part of > the configuration/code that specifies that mapping, and if there's ever a > legitimate need to remap any extension, fixing it becomes pretty fragile. Hardcode it, not configurable. Add a comment if you're worried, saying "Do not change this or else existing files will become inaccessible". Even if you think developers will ignore the comment *and* no one else will notice in code review or testing, which seems excessively pessimistic, it will still be noticed immediately upon deployment, and fixed with minimal damage. If an end-user modifies the source code without knowing what they're doing, on the other hand, they deserve whatever happens to them. There are much more destructive things they can do to their wiki. > Mind you, it's not just JPEG that has multiple choices for filename, it's most > media types. Changing it on an existing wiki seems like it'd really screw > things up, and it's pretty easy to imagine someone trying it. It's extremely hard for me to see why anyone would decide they prefer .jpeg to .jpg (or vice versa) so much that they'd look through the source code, find the code that has the mapping, *and* ignore the comment warning them not to change it. Even if they do something so pathologically stupid, it will be caught quickly and isn't that hard to fix manually. > Regardless, the job of trying a different approach would be made easier by > getting some variant of r60772 checked in (as well as some of the other fixes > and tweaks on that branch), since I'm a little worried that there's more code > that's being checked in that glibly assumes article title==filename. The > sooner those bits are checked in, the easier it would be to maintain a branch > that implements the actual feature. No objection to checking in a preliminary version, but it wouldn't make any sense to do a schema change only to decide we actually don't need it.
(In reply to comment #64) > 2) Problem with the current system (not mentioned for a while): Google > apparently doesn't index image pages properly on non-Wikimedia MW installs, > because it assumes anything ending in .png/.jpeg/etc. is an image page, not an > HTML page. I wrote [[mw:Extension:FilePageMasking]] which transparently rewrites ".xxx" to "_xxx" for image description pages. This solves the Google problem by masking out the extension.
> It's extremely hard for me to see why anyone would decide they prefer .jpeg to > .jpg (or vice versa) so much that they'd look through the source code, find the > code that has the mapping, *and* ignore the comment warning them not to change > it. Even if they do something so pathologically stupid, it will be caught > quickly and isn't that hard to fix manually. I think you may be missing my point, and I also think you need to take a closer look at how things are currently done. Look here: http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/MimeMagic.php?view=markup (38 mime types, 9 with multiple file extensions) ...and here: http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/mime.types?view=markup (137 mime types, 36 with multiple file extensions) ...and here: https://svn.apache.org/repos/asf/httpd/httpd/branches/2.2.x/docs/conf/mime.types (629 mime types, 86 with multiple file extensions) All current and future media types with multiple choices for file extension would need to be hardcoded to specify the immutable preferred version. Granted, not all or even most of these really matter, but even accounting for that, it still leaves a lot of management headache ensuring things stay "right". > No objection to checking in a preliminary version, but it wouldn't make any > sense to do a schema change only to decide we actually don't need it. r60772 isn't a preliminary version. It's a necessary portion of a complete final version that would be needed regardless of whether storing extensions in the database or using hardcoded extensions is the choice (or any other scheme, for that matter). There are no database changes in r60772.
(In reply to comment #68) > I think you may be missing my point, and I also think you need to take a closer > look at how things are currently done. > > Look here: > http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/MimeMagic.php?view=markup > (38 mime types, 9 with multiple file extensions) > > ...and here: > http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/mime.types?view=markup > (137 mime types, 36 with multiple file extensions) > > ...and here: > https://svn.apache.org/repos/asf/httpd/httpd/branches/2.2.x/docs/conf/mime.types > (629 mime types, 86 with multiple file extensions) > > All current and future media types with multiple choices for file extension > would need to be hardcoded to specify the immutable preferred version. > Granted, not all or even most of these really matter, but even accounting for > that, it still leaves a lot of management headache ensuring things stay > "right". Hmm. You might be right, but denormalizing to this extent still doesn't seem like the best solution to me. If anything had to be in the database, we should be able to have a single 1:1 table mapping (img_major_mime, img_minor_mime) -> extension, not the same extension duplicated in millions of image rows. > r60772 isn't a preliminary version. It's a necessary portion of a complete > final version that would be needed regardless of whether storing extensions in > the database or using hardcoded extensions is the choice (or any other scheme, > for that matter). There are no database changes in r60772. No objection from me, then. It's true that I haven't looked closely at this -- I just don't have the time right now, so I only read the RFC.
As of r81601 thumbnailing of files without extension should work. Of course you can't upload files without extension, so this not useful currently, but a step in the proper direction.
Sorry to bring up ancient history, but I was told this is the bug to do it at. Please see http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28proposals%29#Several_changes_to_file_naming - a proposal I put forth (not knowing about this) to fix certain consistent issues with file naming. Most reliant there are points #2 and #3, as I have been told that the first one is impossible. As it stands, the three points are: 1. Case sensitivity in image names: As it stands, three separate users could upload three separate images of three separate subjects called File:TestImage.jpg, File:TeStImAgE.jpg, and File:Testimage.jpg. There is no reason why file names should be case sensitive. 2. Multiple filetype extensions for the same filetype: As it stands, two separate users could upload two separate images of two separate subjects as File:TestImage.jpg and File:TestImage.jpeg. There is no reason for this. 3. Case sensitivity in filetype extensions: As it stands, and as I have seen at least twice recently, two separate images can be uploaded as File:TestImage.jpg and File:TestImage.JPG. This has the potential to cause even more problems that the above situations. There is no reason why filetype extensions should be case sensitive. If we can handle #2 and #3 that would be wonderful.
(In reply to comment #71) > Sorry to bring up ancient history… Closing this bug would effectively nullify 2 & 3. 1 isn't relevant to this bug. (It could be done, but shouldn't be, IMO. Like it or not we have given upper and lower case letters distinction from one another in this world. We need not limit our files thus.)
Alright. If we can, at the very least, knock off 2 and 3, that'd be an improvement. Any word from any devs? Can this be put into motion? There's a ton of support at the thread linked in 71.
We know there's tons of support for this, and we all want to see it happen too. It hasn't happened yet because it will take a bunch of work that has yet to be done.
I have a proposal here for eliminating the file ending and also ending most of the other restrictions on filenames for Commons and uploads in MediaWiki generally. However, I am now spending all my time on other things for the foreseeable future. Maybe someone else will find those ideas useful. http://www.mediawiki.org/wiki/User:NeilK/Multimedia2011/Titles
*** Bug 20971 has been marked as a duplicate of this bug. ***
Okay, you know what, I've had enough of this nonsense. Will someone with more knowledge of Bugzilla split my proposal, (Comment 71), off from this? There are two proposals on this page. One is to remove filetype extensions entirely, and has gotten a whole lot of shrieks of horror over the past five years. The other is my proposal, which really shouldn't have been placed here. I did what I was told to, but my proposal is entirely different from the one made in 2006.
(In reply to comment #77) > Okay, you know what, I've had enough of this nonsense. Will someone with more > knowledge of Bugzilla split my proposal, (Comment 71), off from this? > > There are two proposals on this page. One is to remove filetype extensions > entirely, and has gotten a whole lot of shrieks of horror over the past five > years. The other is my proposal, which really shouldn't have been placed here. > I did what I was told to, but my proposal is entirely different from the one > made in 2006. Why don't you do it yourself (In fact, there's probably already a bug submitted for that. Search for it)
(In reply to comment #77) > Okay, you know what, I've had enough of this nonsense. Will someone with more > knowledge of Bugzilla split my proposal, (Comment 71), off from this? > > There are two proposals on this page. One is to remove filetype extensions > entirely, and has gotten a whole lot of shrieks of horror over the past five > years. The other is my proposal, which really shouldn't have been placed here. > I did what I was told to, but my proposal is entirely different from the one > made in 2006. Split done for your points 2 and 3. Point 1 is probably more controversial and if you want to pursue it, should be separate. Bug 32660 - File extensions for the same file type should not allow variations of a file name (File:X.jpg, File:X.jpeg, File:X.JPG should all refer to the same file)
Comment on attachment 6926 [details] bug4421-robla-v4all-svn60601.patch Patch no longer applies cleanly to trunk per Rusty Burchfield's automated testing https://docs.google.com/spreadsheet/ccc?key=0Ah_71HHl7qa7dGtvSms3TGpHQU9NU2Y1VmNzUEUteWc .
Bug 32660 was broken off of here, and I'm breaking another bug off of that, Bug 40479 "File extensions should be automatically decided by MIME type at upload". It won't fix this bug, but it would be a step in the right direction.