Last modified: 2012-11-29 13:18:41 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T40822, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 38822 - Review Wikibase Repo extension for deployment
Review Wikibase Repo extension for deployment
Status: VERIFIED FIXED
Product: MediaWiki extensions
Classification: Unclassified
WikidataRepo (Other open bugs)
master
All All
: High normal (vote)
: ---
Assigned To: Chris Steipp
:
Depends on: 37683
Blocks: 40000
  Show dependency treegraph
 
Reported: 2012-07-30 08:08 UTC by Daniel Kinzler
Modified: 2012-11-29 13:18 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Daniel Kinzler 2012-07-30 08:08:04 UTC
Review the Wikibase repo extension for deployment. 

It can be found on git in the project mediawiki/extensions/Wikibase, in the repo directory.
Comment 1 Daniel Kinzler 2012-07-30 08:09:32 UTC
Note: this code will only be used on the Wikidata site itself, not on client wikis like Wikipedias.
Comment 2 Rob Lanphier (RobLa) 2012-08-09 14:48:35 UTC
Assigning to Tim for now.  We have some ideas of how to split the review work up a little differently, so we may change these around before reassigning.
Comment 3 Tim Starling 2012-09-27 01:25:59 UTC
* {
	/* defining Arial as default font working around problematic font metrics of Helvetica applied
	in Firefox and Opera on Mac cutting off high characters like "Å" in some cases */
	font-family: Arial, sans-serif;
}

You should find some other way to fix this, global font choice is a matter for the skin, not the Wikibase extension.
Comment 4 Tim Starling 2012-09-27 05:07:36 UTC
57KB of site and language data is too much to be efficiently loaded with every page view request by embedding in a <script> tag. It should be split off to a separate request, with an Expires header, using either API or RL.
Comment 5 Tim Starling 2012-09-27 06:14:25 UTC
It would be nice if it worked for sites other than Wikipedia. It seems a bit funny to implement so many layers of abstraction and then to hard-code the site name.

Global variables names should be prefixed with "wg", including configuration globals. For example, $wbStores should be $wgWBStores.

The performance issue I identified during core review, i.e. O(N^2) write queries for N link updates, seems to still be present. WikibaseCache.sql says "this cache is a shared table, so exists only once per master", but I couldn't find any actual implementation of that mechanism. EntityCacheTable does not override the getReadDb() method or use a database selector in getName(). 

ChangesTable also appears to lack any remote DB support, so it's hard to see how a change could propagate from one wiki to another. When I run pollForChanges.php on a client wiki, it just gives me an error. No doubt you would have tested that, so I'm probably just doing it wrong. But grepping for wfGetLB() doesn't give any hits, and that is the obvious way to connect to a remote DB.
Comment 6 Daniel Kinzler 2012-09-27 15:06:19 UTC
Thanks for the feedback, Tim. You are mentioning 5 issues:

1) 57KB of site and language data: yes, this has been on our todo list forever. I hope we get this moved to a separate resource next week.

2) "Wikipedia" should not be hardcoded anywhere - where did you find this? Maybe as a default setting, or some such?

3) global variables: will do. The convention changed a couple of times it seems, causing confusion. Is our main settings array acceptable as $egWBSettings, or would it become $wgWBSettings?

The last two points are really only relevant to the client, even though the code is in Wikibase/lib. These issues shouldn't block the deployment of the repo, some of the code, like the pollForChanges script, should probably be moved. Anyway: 

4) That we have to (potentially) update all Wikipedias after a single edit on Wikidata lies in the nature of the project, I would think. We are thinking about how to make this more efficient by batching updates. I'll try to prepare a writeup explaining how we currently envision the percolation of the changes.

5) I have worked on remote DB support for ORMTable (and by extension ChangesTable) yesterday, see I261a2a31. I have not yet figured out though how to correctly set up a LBFactory_multi to test this. Can you help me with that? What would a simple setup for two masters (and no slaves) look like?
Comment 7 Tim Starling 2012-09-27 23:46:01 UTC
(In reply to comment #6)
> Thanks for the feedback, Tim. You are mentioning 5 issues:
> 
> 1) 57KB of site and language data: yes, this has been on our todo list forever.
> I hope we get this moved to a separate resource next week.
> 
> 2) "Wikipedia" should not be hardcoded anywhere - where did you find this?
> Maybe as a default setting, or some such?

In ItemView.php:

/**
 * Returns a list of all the sites that can be used as a target for a site link.
 *
 * @static
 * @return array
 */
public static function getSiteDetails() {
...
	if ( $site->getType() === Site::TYPE_MEDIAWIKI && $site->getGroup() === 'wikipedia' ) {

The autocomplete feature that this function services also references Wikipedia in a message (wikibase-error-autocomplete-connection). 

There doesn't seem to be any way to populate the sites table with data other than the data that comes from meta.wikimedia.org, I had to patch Utils::insertDefaultSites() to set up my test instance.

populateInterwiki.php also unconditionally references Wikipedia.

> 3) global variables: will do. The convention changed a couple of times it
> seems, causing confusion. Is our main settings array acceptable as
> $egWBSettings, or would it become $wgWBSettings?

I think $wg is the best convention, since if everything uses it, a configuration UI can drop the prefix. It's almost universal in extensions deployed to WMF, the only exception is the Contest extension, which is another one of Jeroen's projects.

> The last two points are really only relevant to the client, even though the
> code is in Wikibase/lib. These issues shouldn't block the deployment of the
> repo, some of the code, like the pollForChanges script, should probably be
> moved. Anyway: 
> 
> 4) That we have to (potentially) update all Wikipedias after a single edit on
> Wikidata lies in the nature of the project, I would think. We are thinking
> about how to make this more efficient by batching updates. I'll try to prepare
> a writeup explaining how we currently envision the percolation of the changes.

Aren't we talking about a deployment in October? It seems like a pretty basic feature to be starting so late.

> 5) I have worked on remote DB support for ORMTable (and by extension
> ChangesTable) yesterday, see I261a2a31. I have not yet figured out though how
> to correctly set up a LBFactory_multi to test this. Can you help me with that?
> What would a simple setup for two masters (and no slaves) look like?

Here is my LocalSettings.php, if it helps:

http://paste.tstarling.com/p/drrHMe.html

Apologies for the accumulated cruft. It has configuration for various multi-wiki features. For multiple masters, it would be basically the same, except with $wgLBFactoryConf having:

'sectionsByDB' => array(
   'enwiki' => 's1',
),
'sectionLoads' => array(
   's1' => array( 'local1' => 1 ),
   'DEFAULT' => array( 'local2' => 1 ),
),

It's possible to run multiple MySQL servers on the same host. There's a helper script for it called mysqld_multi:

http://dev.mysql.com/doc/refman/5.1/en/mysqld-multi.html

For MediaWiki, it's necessary to use different IP addresses rather than different ports to separate the instances.
Comment 8 Daniel A. R. Werner 2012-09-28 11:29:32 UTC
> I think $wg is the best convention, since if everything uses it,
> a configuration UI can drop the prefix.
Not that it's a big thing, but I just want to mention that alone Jeroen and I are maintaining at least about 25 different extensions not using the 'wg' prefix and I am sure there are a few more out there.
So I don't think a configuration UI could or should ever easily work based on the 'wg' prefix.
Comment 9 Daniel Kinzler 2012-09-28 12:48:48 UTC
(In reply to comment #7)
> > 2) "Wikipedia" should not be hardcoded anywhere - where did you find this?
> > Maybe as a default setting, or some such?

Ok, I filed what you mentioned as Bug 40594.

> > 4) That we have to (potentially) update all Wikipedias after a single edit on
> > Wikidata lies in the nature of the project, I would think. We are thinking
> > about how to make this more efficient by batching updates. I'll try to prepare
> > a writeup explaining how we currently envision the percolation of the changes.
> 
> Aren't we talking about a deployment in October? It seems like a pretty basic
> feature to be starting so late.

Our current implementation works fine if you have one poll script per wiki. It uses $wgSharedTables for accessing the repo's wb_changes table, which only works if that's on the same server. So I'm now changing this to use the foreign wiki stuff.

This should be sufficient for a deployment with a handful of client wikis. A better solution is needed if we want to deploy the client stuff to all Wikipedias. That's what the writeup is about.
 
> Here is my LocalSettings.php, if it helps:
> 
> http://paste.tstarling.com/p/drrHMe.html

Cool, thanks for the link!

Reedy also pointed me to https://noc.wikimedia.org/conf/db.php.txt, which gave me some idea of how this works.

One question about terminology though: Can you explain to me what are "sections" and "groups", and how they related to "clusters"?
Comment 10 Tim Starling 2012-10-08 04:35:50 UTC
(In reply to comment #9)
> Our current implementation works fine if you have one poll script per wiki. It
> uses $wgSharedTables for accessing the repo's wb_changes table, which only
> works if that's on the same server. So I'm now changing this to use the foreign
> wiki stuff.

$wgSharedTables is outdated. I will add a deprecation warning.

> One question about terminology though: Can you explain to me what are
> "sections" and "groups", and how they related to "clusters"?

A section is collection of wiki databases. A shared database like centralauth is treated like a wiki in that it can be in a section. 

A query group (sometimes abbreviated to "group") is the set of queries which come from a particular caller or a related set of callers, for example user contributions queries. Query group configuration allows such queries to be directed to particular slaves, to make efficient use of the RAM cache or to avoid having one feature overload the server used by another feature.

A cluster is a master DB server and its associated slaves, which are used by ExternalStoreDB for reading and writing article text data.
Comment 11 Rob Lanphier 2012-11-01 01:05:13 UTC
I believe the security+architecture review that Chris did, plus all of the architecture discussions we've had, are sufficient for a deployment.  Please reopen if you feel we need additional review on any of these.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links