Last modified: 2014-10-07 08:14:10 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T23345, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 21345 - Make old images searchable by hash
Make old images searchable by hash
Status: NEW
Product: MediaWiki
Classification: Unclassified
API (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
: 37376 58992 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-10-29 11:03 UTC by Maarten Dammers
Modified: 2014-10-07 08:14 UTC (History)
12 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Maarten Dammers 2009-10-29 11:03:46 UTC
I use the search by hash option to prevent my bots from uploading duplicate images. If an image gets changed (for example rotated) this won't work because the hash won't match. The hash of the old image is available in the oldimage table now. It would be very nice if the oldimage table is searchable in the api just like the image table.
Comment 1 Sam Reed (reedy) 2010-01-06 22:25:55 UTC
This should be relatively simple (comment so we get some input from Roan)

Theres a few ways this could be implemented...

We could have an option only to search oldimage.. Or something similar to only search old image for a hash if the image isn't found in image.. Can easily set an attribute of "old" or similar.

We could have a version of the code for only oldimage, and one for new image (inheritance etc).. Which might not actually be a bad idea in the long run - People can get access to the old images via the file/image page..

Certainly searching oldimage when there is no need to is a bad idea


Maarten, is there any intended behaviour on your part? (Or preference one way or another)
Comment 2 Maarten Dammers 2010-01-07 09:23:51 UTC
Probably the nicest way is to have one search which only searches the image table by default, but with an option to also search oldimage or only search oldimage. Searching oldimage when an image is not available in the image table sure would be nice too. In the future the searching of the filearchive can be added in a similar way (but that's another bug).
Comment 3 Sam Reed (reedy) 2010-01-07 10:41:57 UTC
Isn't file archive deleted stuff? And therefore, would have to be right limited?
Comment 4 Maarten Dammers 2010-01-07 10:54:01 UTC
Reedy, see https://bugzilla.wikimedia.org/show_bug.cgi?id=21346 for the filearchive part
Comment 5 Sam Reed (reedy) 2010-01-11 13:16:07 UTC
[13:13:57] <RoanKattouw> Currently prop=imageinfo returns information from both image and oldimage, but its hash search only searches image. It could be made to search oldimage as well but that'd probably produce weird results à la using &revids= with prop=templates (see bug 22079)
[13:14:25] <RoanKattouw> i.e. the search would hit an old version of the image but you'd get imageinfo for the current version; that'd be weird and probably needs to be addressed
[13:15:03] <RoanKattouw> Which may require quite a bit of redesign of the imageinfo module, it's a bit of a mess right now
Comment 6 Roan Kattouw 2011-08-13 10:46:12 UTC
RoanKattouw	Yeah I guess an action=findfilebyhash or something that searches all 3 tables for a user-provided hash makes sense
multichill	That would be nice yes

By "all 3 tables" we mean image, oldimage and filearchive. Current solutions such as prop=duplicatefiles and aisha1 only check image and are badly suited to checking the others.
Comment 7 db [inactive,noenotif] 2013-03-17 01:02:22 UTC
*** Bug 37376 has been marked as a duplicate of this bug. ***
Comment 8 Rainer Rillke @commons.wikimedia 2013-04-03 09:34:39 UTC
Possible use cases:
* Bots that check SHA1 *before* uploading
* UploadWizard could compute SHA1 with fileReader in compat. browsers *before* uploading (or while it uploads other files)
* Investigation - Which user previously uploaded the same file (to find socks of copyvio uploaders)

image and filearchive are anyway seeked by the servers when uploading an image (for throwing a warning)
Comment 9 Rainer Rillke @commons.wikimedia 2013-12-28 20:21:54 UTC
*** Bug 58992 has been marked as a duplicate of this bug. ***
Comment 10 Rainer Rillke @commons.wikimedia 2014-02-14 20:31:54 UTC
For the time being: https://tools.wmflabs.org/rillke/jsonapi.php?action=sha1lookup&sha1=<SHA1>
Comment 11 John Mark Vandenberg 2014-10-07 06:22:01 UTC
(In reply to Rainer Rillke @commons.wikimedia from comment #10)
> For the time being:
> https://tools.wmflabs.org/rillke/jsonapi.php?action=sha1lookup&sha1=<SHA1>

@Maarten Dammers, should we use this tool in pywikibot, until a mediawiki solution is deployed?
Comment 12 Maarten Dammers 2014-10-07 08:14:10 UTC
We might as well do it, but should probably keep an pywikibot bug open to keep track of this.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links