Last modified: 2014-06-14 16:47:37 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T59697, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 57697 - Provide filearchive table with fa_storage_key or, if it exists and is sufficiently indexed and populated, fa_sha1 for commonswiki
Provide filearchive table with fa_storage_key or, if it exists and is suffici...
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: Unprioritized enhancement
: ---
Assigned To: Marc A. Pelletier
:
: 61813 (view as bug list)
Depends on:
Blocks: tool-missing-ts-feat
  Show dependency treegraph
 
Reported: 2013-11-28 01:01 UTC by Rainer Rillke @commons.wikimedia
Modified: 2014-06-14 16:47 UTC (History)
7 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Rainer Rillke @commons.wikimedia 2013-11-28 01:01:18 UTC
I'd (possibly) like to create a JSON/XML-API that could be asked prior to uploading stuff whether it was uploaded before.
Comment 1 Marc A. Pelletier 2014-05-06 01:44:10 UTC
I can see several issues with this, not least of which the ability to identify whether an arbitrary file has been uploaded in the past which may have legal implications.

Since this table contains data not normally available to non-administrators (or to any user, when the sha1 is concerned), this will need evaluation from Legal.
Comment 2 Rainer Rillke @commons.wikimedia 2014-05-06 06:33:50 UTC
(In reply to Marc A. Pelletier from comment #1)
https://bugzilla.wikimedia.org/show_bug.cgi?id=58993
Comment 3 Luis Villa (personal-for work use lvilla@wikimedia.org) 2014-05-07 18:06:40 UTC
What's the difference between this one and 58993? I feel like I'm missing something.

And note that with 58993, I have given legal signoff if that wasn't entirely clear :)
Comment 4 Bawolff (Brian Wolff) 2014-05-23 22:37:08 UTC
(In reply to Luis Villa (personal-for work use lvilla@wikimedia.org) from comment #3)
> What's the difference between this one and 58993? I feel like I'm missing
> something.
> 
> And note that with 58993, I have given legal signoff if that wasn't entirely
> clear :)

bug 58993 is asking for the information to be available to everyone via http://commons.wikimedia.org/w/api.php

This bug is asking for it to be available in the db replicas at tools.wmflabs.org

From a legal perspective, not much different (probably). From a technical perspective, very different areas of Wikimedia, with different groups working on it.
Comment 5 Marc A. Pelletier 2014-05-23 22:41:55 UTC
Handing off to Sean as this now has Legal okay for the filearchive table.
Comment 6 Sean Pringle 2014-06-05 10:13:25 UTC
Table is replicated.
Comment 7 Gerrit Notification Bot 2014-06-06 15:06:29 UTC
Change 137938 had a related patch set uploaded by coren:
Labs: new replication views

https://gerrit.wikimedia.org/r/137938
Comment 8 Gerrit Notification Bot 2014-06-06 15:36:46 UTC
Change 137938 merged by coren:
Labs: new replication views

https://gerrit.wikimedia.org/r/137938
Comment 9 Marc A. Pelletier 2014-06-11 17:33:31 UTC
*** Bug 61813 has been marked as a duplicate of this bug. ***
Comment 10 Rainer Rillke @commons.wikimedia 2014-06-14 16:47:37 UTC
Thank you. What I, however observe is that fa_sha1-queries are notably slower compared to img_sha1 queries and oi_sha1 queries:

18:38:38	SELECT * FROM commonswiki_p.image WHERE img_sha1="qtexhtbcwt0tnkuxb2wf3xs7d7j761u" LIMIT 0, 1000	1 row(s) returned	0.172 sec / 0.000 sec

18:36:31	SELECT * FROM commonswiki_p.oldimage WHERE oi_sha1="0mpoldytyxspxrdbf44r1kc7m8vtbq67" LIMIT 0, 1000	0 row(s) returned	0.156 sec / 0.000 sec

18:36:07	SELECT * FROM commonswiki_p.filearchive WHERE fa_sha1="0mpoldytyxspxrdbf44r1kc7m8vtbq67" LIMIT 0, 1000	1 row(s) returned	5.990 sec / 0.000 sec


5.990 sec vs. 0.172 sec is a huge difference. Is something broken with indexing?

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links