Last modified: 2013-08-13 08:39:36 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T42331, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 40331 - Replicate gerrit database somewhere to allow free querying
Replicate gerrit database somewhere to allow free querying
Status: REOPENED
Product: Wikimedia
Classification: Unclassified
Git/Gerrit (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-09-18 15:45 UTC by Nemo
Modified: 2013-08-13 08:39 UTC (History)
6 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Nemo 2012-09-18 15:45:24 UTC
Chad says that until the lucene stuff is done, we won't have things like full text search (https://code.google.com/p/gerrit/issues/detail?id=866 ) or search by commenter (https://code.google.com/p/gerrit/issues/detail?id=1567 ).
It would be useful for everyone in the community to be able to query such information and more (from a safe environment in the meanwhile).

The best solution seems to be replicating the DB so that it's accessible to all users on Labs or (IMHO better) on Toolserver, which would also allow all sorts of cool web tools.
Chad: «I'm pretty sure we could replicate the DB. Only table I'd want to exclude is probably account_ssh_keys. Other than that, there's nothing really private».
If Toolserver is considered best for WMF end, I'll open a TS ticket too.
Comment 1 Ryan Lane 2012-09-18 18:08:59 UTC
Toolserver is going away. It's always best to do new things on Labs.
Comment 2 Chad H. 2013-04-17 22:30:11 UTC
The more I've thought about this, the less I think it's a good idea. It's not just having to sanitize a good amount of data (ssh keys, external id table, draft patches/comments), but with each release, Gerrit is deleting more and more tables. Given that, I'd rather not encourage tools relying on data that's being deprecated from day 1.

Instead, I'd recommend writing tools that use stable APIs. The REST API[0] and the `query` SSH command[1] both provide stable machine-readable data sources that can provide nearly (if not all) data that the database can. And given that the API is being actively expanded and added to, it's not difficult to add new endpoints if there's other data we're after.

So this is a WONTFIX, but with an open invitation for people to use other ways to consume Gerrit's data :)

[0] https://gerrit.wikimedia.org/r/Documentation/rest-api.html
[1] https://gerrit.wikimedia.org/r/Documentation/cmd-query.html
Comment 3 Nemo 2013-04-22 17:59:21 UTC
(In reply to comment #2)
> And given that
> the API is being actively expanded and added to, it's not difficult to add
> new endpoints if there's other data we're after.

Thank you, Chad. So what's the way forward to get

(from comment #0)
> things like full
> text search (https://code.google.com/p/gerrit/issues/detail?id=866 ) or
> search
> by commenter (https://code.google.com/p/gerrit/issues/detail?id=1567 )

?
Should those bugs be somehow prioritized upstream (and how), or should they be repurposed to affect only the API/queries, or what else?
Comment 4 Nemo 2013-07-08 08:05:32 UTC
As for reviewers, now I just used the Toolserver clone of the mediawiki/* repos to access ref/notes/review for one of the use cases I had in mind, working around this and bug 46452: https://www.mediawiki.org/?diff=726041&oldid=726025

The API <https://gerrit.wikimedia.org/r/Documentation/rest-api-changes.html#list-reviewers> would be the same, as far as I understand (just much slower), so the core review that doesn't get "merged" is still out. https://code.google.com/p/gerrit/issues/detail?id=1861
Comment 5 Nemo 2013-08-13 04:27:24 UTC
I hope it's not unkind to quote this from #wikimedia-dev:

02.47 < ^d> I've changed my mind, I'm willing to replicate the gerrit db provided 2 things.
02.47 < ^d> A) Anything security that's not public should either be made public or removed, and gerrit stop being used for security patches.
02.48 < ^d> B) I can sanitize one column. There's one column I'm not ok with exposing.
02.50 < ^d> account_external_ids.password
02.51 < ^d> That's used if you generate a password to access gerrit over https :)

Related to bug 52329 being harder than expected.
Comment 6 MZMcBride 2013-08-13 04:35:16 UTC
Re-opening this bug for further consideration.

There's substantial and substantive evidence that providing replicated copies of databases is an enormous benefit to developers and end-users (cf. the Wikimedia Toolserver and Wikimedia Labs). We already have the infrastructure in place to replicate database tables, with filtering of specific columns as necessary.

Unless it is an absolute impossibility, we should open Gerrit's data to the masses and allow others to build exciting and wonderful tools on top of it. :-)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links