Last modified: 2013-12-24 13:03:23 UTC
It's currently impossible to do joins between Wikipedia and Commons/Wikidata. The labs database servers should have copies of Commons and Wikidata just like at the Toolserver.
Actually, what's missing here is documentation and not functionality: every shard that does not have the original database has a federated database link to commonswiki and wikidatawiki; but they are named 'commonswiki_f_p' and 'wikidatawiki_f_p' respectively. Using those is functionally identical to using an actual local view, except that performance can be severely impacted if you do joins on non-indexed columns (which you should never do anyways). Keeping the bug open but renaming it to track the need for documentation instead.
You need to do more than document. MariaDB [nlwiki_p]> connect nlwiki_p nlwiki.labsdb; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Connection id: 1008694 Current database: nlwiki_p MariaDB [nlwiki_p]> use commonswiki_f_p; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed MariaDB [commonswiki_f_p]> show tables; +---------------------------+ | Tables_in_commonswiki_f_p | +---------------------------+ | image | | logging | | logging_userindex | | page | | revision | | revision_userindex | | user | +---------------------------+ 7 rows in set (0.04 sec) Should be 64 tables.
Those are the only tables for which queries are being used in joins in practice. Adding more is not an overly complicated operation, but will only be done as needed.
If it's not overly complicated, why not just do it? Having the tables in-place makes sure anyone who *does* want to use them *can*, without having to jump through hoops.
Are you serious? W(In reply to comment #3) > Those are the only tables for which queries are being used in joins in > practice. Adding more is not an overly complicated operation, but will only > be > done as needed. Are you serious? Based on what practice? Just add the missing tables.
There is a maintenance cost associated with maintaining federated tables (inter alia, it requires review at any schema change). The fewer such tables in place, the less likely it is that a problem is introduced. Also, joins between databases living in different slices should be a rare operation and is almost always better done differently; requiring a request to add new tables ensures that there is an opportunity to review the proposed use case before it's implemented (to avoid having to force people to change their tools later).
The main reason why I have a Toolserver account is to do complicated cross database joins. For example: Give all items on Wikidata that don't have a claim which link to an article on the Dutch Wikipedia using https://nl.wikipedia.org/wiki/Sjabloon:Taxobox . Of course these are not run very often, let alone part of a tool. These results serve as input for a bot to do the actual work. If not all tables are available I can't do this anymore. Than I've lost the most important functionality of Toolserver/Toollabs. The fact that it's hard to maintain doesn't impress me. When the WMF started the whole database endeavor and made a choice to not have copies but federation this implication should have been taken into account.
Copies would have been even worse. The vast majority of downtimes of the replication in toolserver were caused /by/ the multiple replication. That said, could you give me the queries which you /do/ run? Like I said, if there are use cases that need support I will support them; but I'm not going to create maintenance overhead for hypotectical joins with commonswiki.module_deps just for the fun of completing a list of checkmarks.
(in particular, I am certain that many tables in wikidatawiki would be useful to add to federation; but I cannot guess which nor would it be reasonable to preemptively include all of them).
(In reply to comment #6) > There is a maintenance cost associated with maintaining federated tables > (inter > alia, it requires review at any schema change). The fewer such tables in > place, the less likely it is that a problem is introduced. > [...] But the review is already necessary for the change to the source, so there is no extra cost. As this bug is a subset of bug #57876, closing this one as a duplicate. *** This bug has been marked as a duplicate of bug 57876 ***