Last modified: 2013-12-24 13:03:23 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 58802 - Document how to use federated commonswiki and wikidatawiki databases
Document how to use federated commonswiki and wikidatawiki databases
Status: RESOLVED DUPLICATE of bug 57876
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: Unprioritized normal
: ---
Assigned To: Marc A. Pelletier
:
Depends on:
Blocks: labs-replication tool-missing-ts-feat
  Show dependency treegraph
 
Reported: 2013-12-21 15:50 UTC by Maarten Dammers
Modified: 2013-12-24 13:03 UTC (History)
5 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Maarten Dammers 2013-12-21 15:50:35 UTC
It's currently impossible to do joins between Wikipedia and Commons/Wikidata. The labs database servers should have copies of Commons and Wikidata just like at the Toolserver.
Comment 1 Marc A. Pelletier 2013-12-21 17:02:26 UTC
Actually, what's missing here is documentation and not functionality:

every shard that does not have the original database has a federated database link to commonswiki and wikidatawiki; but they are named 'commonswiki_f_p' and 'wikidatawiki_f_p' respectively.

Using those is functionally identical to using an actual local view, except that performance can be severely impacted if you do joins on non-indexed columns (which you should never do anyways).

Keeping the bug open but renaming it to track the need for documentation instead.
Comment 2 Maarten Dammers 2013-12-21 17:14:15 UTC
You need to do more than document.

MariaDB [nlwiki_p]> connect nlwiki_p nlwiki.labsdb;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Connection id:    1008694
Current database: nlwiki_p

MariaDB [nlwiki_p]> use commonswiki_f_p;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [commonswiki_f_p]> show tables;
+---------------------------+
| Tables_in_commonswiki_f_p |
+---------------------------+
| image                     |
| logging                   |
| logging_userindex         |
| page                      |
| revision                  |
| revision_userindex        |
| user                      |
+---------------------------+
7 rows in set (0.04 sec)

Should be 64 tables.
Comment 3 Marc A. Pelletier 2013-12-21 17:18:11 UTC
Those are the only tables for which queries are being used in joins in practice.  Adding more is not an overly complicated operation, but will only be done as needed.
Comment 4 Merlijn van Deen (test) 2013-12-21 17:21:34 UTC
If it's not overly complicated, why not just do it? Having the tables in-place makes sure anyone who *does* want to use them *can*, without having to jump through hoops.
Comment 5 Maarten Dammers 2013-12-21 17:23:53 UTC
Are you serious? W(In reply to comment #3)
> Those are the only tables for which queries are being used in joins in
> practice.  Adding more is not an overly complicated operation, but will only
> be
> done as needed.

Are you serious? Based on what practice? Just add the missing tables.
Comment 6 Marc A. Pelletier 2013-12-21 17:27:10 UTC
There is a maintenance cost associated with maintaining federated tables (inter alia, it requires review at any schema change).  The fewer such tables in place, the less likely it is that a problem is introduced.

Also, joins between databases living in different slices should be a rare operation and is almost always better done differently; requiring a request to add new tables ensures that there is an opportunity to review the proposed use case before it's implemented (to avoid having to force people to change their tools later).
Comment 7 Maarten Dammers 2013-12-21 17:42:55 UTC
The main reason why I have a Toolserver account is to do complicated cross database joins. 

For example: Give all items on Wikidata that don't have a claim which link to an article on the Dutch Wikipedia using https://nl.wikipedia.org/wiki/Sjabloon:Taxobox .

Of course these are not run very often, let alone part of a tool. These results serve as input for a bot to do the actual work. 

If not all tables are available I can't do this anymore. Than I've lost the most important functionality of Toolserver/Toollabs.

The fact that it's hard to maintain doesn't impress me. When the WMF started the whole database endeavor and made a choice to not have copies but federation this implication should have been taken into account.
Comment 8 Marc A. Pelletier 2013-12-21 17:47:34 UTC
Copies would have been even worse.  The vast majority of downtimes of the replication in toolserver were caused /by/ the multiple replication.

That said, could you give me the queries which you /do/ run?  Like I said, if there are use cases that need support I will support them; but I'm not going to create maintenance overhead for hypotectical joins with commonswiki.module_deps just for the fun of completing a list of checkmarks.
Comment 9 Marc A. Pelletier 2013-12-21 17:49:31 UTC
(in particular, I am certain that many tables in wikidatawiki would be useful to add to federation; but I cannot guess which nor would it be reasonable to preemptively include all of them).
Comment 10 Tim Landscheidt 2013-12-22 11:02:46 UTC
(In reply to comment #6)
> There is a maintenance cost associated with maintaining federated tables
> (inter
> alia, it requires review at any schema change).  The fewer such tables in
> place, the less likely it is that a problem is introduced.
> [...]

But the review is already necessary for the change to the source, so there is no extra cost.

As this bug is a subset of bug #57876, closing this one as a duplicate.

*** This bug has been marked as a duplicate of bug 57876 ***

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links