Last modified: 2014-10-07 08:14:10 UTC
I use the search by hash option to prevent my bots from uploading duplicate images. If an image gets changed (for example rotated) this won't work because the hash won't match. The hash of the old image is available in the oldimage table now. It would be very nice if the oldimage table is searchable in the api just like the image table.
This should be relatively simple (comment so we get some input from Roan) Theres a few ways this could be implemented... We could have an option only to search oldimage.. Or something similar to only search old image for a hash if the image isn't found in image.. Can easily set an attribute of "old" or similar. We could have a version of the code for only oldimage, and one for new image (inheritance etc).. Which might not actually be a bad idea in the long run - People can get access to the old images via the file/image page.. Certainly searching oldimage when there is no need to is a bad idea Maarten, is there any intended behaviour on your part? (Or preference one way or another)
Probably the nicest way is to have one search which only searches the image table by default, but with an option to also search oldimage or only search oldimage. Searching oldimage when an image is not available in the image table sure would be nice too. In the future the searching of the filearchive can be added in a similar way (but that's another bug).
Isn't file archive deleted stuff? And therefore, would have to be right limited?
Reedy, see https://bugzilla.wikimedia.org/show_bug.cgi?id=21346 for the filearchive part
[13:13:57] <RoanKattouw> Currently prop=imageinfo returns information from both image and oldimage, but its hash search only searches image. It could be made to search oldimage as well but that'd probably produce weird results à la using &revids= with prop=templates (see bug 22079) [13:14:25] <RoanKattouw> i.e. the search would hit an old version of the image but you'd get imageinfo for the current version; that'd be weird and probably needs to be addressed [13:15:03] <RoanKattouw> Which may require quite a bit of redesign of the imageinfo module, it's a bit of a mess right now
RoanKattouw Yeah I guess an action=findfilebyhash or something that searches all 3 tables for a user-provided hash makes sense multichill That would be nice yes By "all 3 tables" we mean image, oldimage and filearchive. Current solutions such as prop=duplicatefiles and aisha1 only check image and are badly suited to checking the others.
*** Bug 37376 has been marked as a duplicate of this bug. ***
Possible use cases: * Bots that check SHA1 *before* uploading * UploadWizard could compute SHA1 with fileReader in compat. browsers *before* uploading (or while it uploads other files) * Investigation - Which user previously uploaded the same file (to find socks of copyvio uploaders) image and filearchive are anyway seeked by the servers when uploading an image (for throwing a warning)
*** Bug 58992 has been marked as a duplicate of this bug. ***
For the time being: https://tools.wmflabs.org/rillke/jsonapi.php?action=sha1lookup&sha1=<SHA1>
(In reply to Rainer Rillke @commons.wikimedia from comment #10) > For the time being: > https://tools.wmflabs.org/rillke/jsonapi.php?action=sha1lookup&sha1=<SHA1> @Maarten Dammers, should we use this tool in pywikibot, until a mediawiki solution is deployed?
We might as well do it, but should probably keep an pywikibot bug open to keep track of this.