Last modified: 2008-03-19 21:48:21 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 1459 - search for images/files by hash
search for images/files by hash
Product: MediaWiki
Classification: Unclassified
Search (Other open bugs)
All All
: Normal enhancement with 5 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
Depends on: 5763
  Show dependency treegraph
Reported: 2005-02-03 17:37 UTC by peter green
Modified: 2008-03-19 21:48 UTC (History)
3 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Description peter green 2005-02-03 17:37:06 UTC
it would be very usefull to be able to search for images by a hash (the exact
type of hash doesn't bother me too much md5 or sha1 would be fine)

this hash should also be displayed on the image description page somewhere.

the point of this is if i see an image in the commons that says "from german
wikipedia" and the uploader has renamed it i want to be able to find the image
in the german wikipedia.
Comment 1 Antti Aaltonen 2006-04-21 13:10:58 UTC
This feature would also help with duplicate files under different names, if
extended a bit. People upload a file not knowing that it's already there,
because the first one wasn't categorized very well or the duplicate uploader
just doesn't look thoroughly enough. There's however no reason that people would
have to do this searching manually.

On each upload of a file MediaWiki could:

1) Generate a hash of the uploaded file

2) Check if the generated hash is already known, ie. if the file is a duplicate
   * This part would be the only necessary database query for a hash search feature.

Then, depending on configuration based on analysis of possible false hash
collisions and such, it could then:

3a) Display a warning to the user that the file already exists
3b) Display an error to the user that the file already exists, and reject the file

This would require counting a hash, or even multiple hashes with different
methods, for all revisions of all existing files. Duplicate detection would not
work properly while hashes are being generated and added to the database. Hashes
for deleted files or revisions would also be useful for generating different
warnings when someone uploads a file already deleted before, but its
implementation might be more complicated.
Comment 2 peter green 2006-04-21 18:45:19 UTC
what would also be usefull is to generate hashes for all thumbnails that are
generated. As often the kind of people who copy images without proper
attribution are the kind of people who copy a thumbnail rather than the full res
Comment 3 Rob Church 2006-07-04 11:24:08 UTC

*** This bug has been marked as a duplicate of 5763 ***
Comment 4 Brion Vibber 2008-03-18 23:17:21 UTC
Note there is now a properly-indexed SHA-1 hash field on the image table in recent versions. I have the vague recollection that there's a way to do lookups by hash in the API, but not in the UI at present.

Dupe file warnings are also not currently made.
Comment 5 Roan Kattouw 2008-03-19 12:30:53 UTC
(In reply to comment #4)
> I have the vague recollection that there's a way to do lookups
> by hash in the API, but not in the UI at present.
Comment 6 Raimond Spekking 2008-03-19 17:05:55 UTC
[[Special:FileDuplicateSearch]] introduced with r32180. A link on the image description page to Special:FileDuplicateSearch/filename.ext added too.

Bug 11984 filed for dupe file warning at time of upload.
Comment 7 Huji 2008-03-19 21:48:21 UTC
And bug 13434 is also filed :)

Note You need to log in before you can comment on or make changes to this bug.