Last modified: 2008-03-19 21:48:21 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T3459, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 1459 - search for images/files by hash
search for images/files by hash
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
unspecified
All All
: Normal enhancement with 5 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on: 5763
Blocks:
  Show dependency treegraph
 
Reported: 2005-02-03 17:37 UTC by peter green
Modified: 2008-03-19 21:48 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description peter green 2005-02-03 17:37:06 UTC
it would be very usefull to be able to search for images by a hash (the exact
type of hash doesn't bother me too much md5 or sha1 would be fine)

this hash should also be displayed on the image description page somewhere.

the point of this is if i see an image in the commons that says "from german
wikipedia" and the uploader has renamed it i want to be able to find the image
in the german wikipedia.
Comment 1 Antti Aaltonen 2006-04-21 13:10:58 UTC
This feature would also help with duplicate files under different names, if
extended a bit. People upload a file not knowing that it's already there,
because the first one wasn't categorized very well or the duplicate uploader
just doesn't look thoroughly enough. There's however no reason that people would
have to do this searching manually.

On each upload of a file MediaWiki could:

1) Generate a hash of the uploaded file

2) Check if the generated hash is already known, ie. if the file is a duplicate
   * This part would be the only necessary database query for a hash search feature.

Then, depending on configuration based on analysis of possible false hash
collisions and such, it could then:

3a) Display a warning to the user that the file already exists
 or
3b) Display an error to the user that the file already exists, and reject the file

This would require counting a hash, or even multiple hashes with different
methods, for all revisions of all existing files. Duplicate detection would not
work properly while hashes are being generated and added to the database. Hashes
for deleted files or revisions would also be useful for generating different
warnings when someone uploads a file already deleted before, but its
implementation might be more complicated.
Comment 2 peter green 2006-04-21 18:45:19 UTC
what would also be usefull is to generate hashes for all thumbnails that are
generated. As often the kind of people who copy images without proper
attribution are the kind of people who copy a thumbnail rather than the full res
image.
Comment 3 Rob Church 2006-07-04 11:24:08 UTC

*** This bug has been marked as a duplicate of 5763 ***
Comment 4 Brion Vibber 2008-03-18 23:17:21 UTC
Note there is now a properly-indexed SHA-1 hash field on the image table in recent versions. I have the vague recollection that there's a way to do lookups by hash in the API, but not in the UI at present.

Dupe file warnings are also not currently made.
Comment 5 Roan Kattouw 2008-03-19 12:30:53 UTC
(In reply to comment #4)
> I have the vague recollection that there's a way to do lookups
> by hash in the API, but not in the UI at present.
> 
api.php?action=query&list=allimages&aisha1=123abc
Comment 6 Raimond Spekking 2008-03-19 17:05:55 UTC
[[Special:FileDuplicateSearch]] introduced with r32180. A link on the image description page to Special:FileDuplicateSearch/filename.ext added too.

Bug 11984 filed for dupe file warning at time of upload.
Comment 7 Huji 2008-03-19 21:48:21 UTC
And bug 13434 is also filed :)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links