Last modified: 2014-07-03 11:11:47 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T50626, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 48626 - Provide wiki metadata in the databases similar to toolserver.wiki
Provide wiki metadata in the databases similar to toolserver.wiki
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Marc A. Pelletier
:
Depends on:
Blocks: labs-replication 67476
  Show dependency treegraph
 
Reported: 2013-05-20 04:35 UTC by Tim Landscheidt
Modified: 2014-07-03 11:11 UTC (History)
8 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Tim Landscheidt 2013-05-20 04:35:01 UTC
Toolserver has the local table toolserver.wiki on all databases that provides metadata about the wikis including the server the wiki's database is kept on:

| mysql> SELECT * FROM toolserver.wiki LIMIT 5;
| +----------------+------+------------+------------------+------+---------+-----------+--------------+--------------+---------------+--------+-------------+
| | dbname         | lang | family     | domain           | size | is_meta | is_closed | is_multilang | is_sensitive | root_category | server | script_path |
| +----------------+------+------------+------------------+------+---------+-----------+--------------+--------------+---------------+--------+-------------+
| | aawikibooks_p  | aa   | wikibooks  | NULL             |    3 |       0 |         1 |            0 |            0 | NULL          |      3 | /w/         |
| | aawiki_p       | aa   | wikipedia  | NULL             |    6 |       0 |         1 |            0 |            0 | NULL          |      3 | /w/         |
| | aawiktionary_p | aa   | wiktionary | NULL             |    1 |       0 |         1 |            0 |            1 | NULL          |      3 | /w/         |
| | abwiki_p       | ab   | wikipedia  | ab.wikipedia.org |  807 |       0 |         0 |            0 |            0 | NULL          |      3 | /w/         |
| | abwiktionary_p | ab   | wiktionary | NULL             |    0 |       0 |         1 |            0 |            1 | NULL          |      3 | /w/         |
| +----------------+------+------------+------------------+------+---------+-----------+--------------+--------------+---------------+--------+-------------+
| 5 rows in set (0.00 sec)

| mysql>

Most of the information can probably be extracted from operations/mediawiki-config, but I don't know which sources there are authoritative.
Comment 1 Tim Landscheidt 2013-05-25 02:58:26 UTC
Played around with:

| include ($MediaWikiRepoPath . "/includes/Defines.php");
| include ($WmfConfigRepoPath . "/wmf-config/InitialiseSettings.php");
| var_dump ($wgConf->settings);

but it doesn't yield for example information about de.wikipedia.org.
Comment 2 Liangent 2013-05-25 07:46:25 UTC
(In reply to comment #1)
> Played around with:
> 
> | include ($MediaWikiRepoPath . "/includes/Defines.php");
> | include ($WmfConfigRepoPath . "/wmf-config/InitialiseSettings.php");
> | var_dump ($wgConf->settings);
> 
> but it doesn't yield for example information about de.wikipedia.org.

Some experiments:

$ php maintenance/eval.php 
> $wgDBname='zhwiki';

> $wmfRealm='production';

> $mwConfigDir="$IP/../operations/mediawiki-config";

> $wmfConfigDir="$mwConfigDir/wmf-config";

> function getRealmSpecificFilename($p){global $IP,$wmfConfigDir;return str_replace($p,$IP,$wmfConfigDir);}

> function wmfLoadInitialiseSettings($c){global $wmfConfigDir;require("$wmfConfigDir/InitialiseSettings.php");}

> require("$wmfConfigDir/wgConf.php");

> list($site,$lang)=$wgConf->siteFromDB($wgDBname);

> $wikiTags=array();

> $mwConfigDirHandle=opendir($mwConfigDir);

> while(($f=readdir($mwConfigDirHandle))!==false){if(pathinfo($f,PATHINFO_EXTENSION)==='dblist'&&in_array($wgDBname,array_map('trim',file("$mwConfigDir/$f")))){$wikiTags[]=pathinfo($f,PATHINFO_FILENAME);}}

> $dbSuffix = ( $site === 'wikipedia' ) ? 'wiki' : $site;

> $wgConf->loadFullData();

> $globals = $wgConf->getAll( $wgDBname, $dbSuffix,array('lang'    => $lang,'site'    => $site,'stdlogo' => "//upload.wikimedia.org/$site/$lang/b/bc/Wiki.png"), $wikiTags );

> print_r($globals);
Array
(
    [wgLegacyEncoding] => 
    [wgCapitalLinks] => 1
    ...
)

>
Comment 3 Liangent 2013-06-02 17:41:58 UTC
Do we want a database table consisting of three columns: wiki, config_variable_name, and config_variable_value (as a serialized blob)?
Comment 4 MZMcBride 2013-06-02 17:48:30 UTC
I think we should have a discussion about what the current "toolserver" database is, what we want in the future, and whether we care about breaking backward compatibility.

Some of the design decisions in some of the database tables could probably be re-thought, but only if we're willing to break the current interfaces.

In addition, I think we should only rely on MediaWiki's API for this information (with user authentication, as necessary). This is the cleanest and sanest way to accurately get this information, as far as I know.
Comment 5 Marc A. Pelletier 2013-06-02 19:50:28 UTC
(In reply to comment #4)
> In addition, I think we should only rely on MediaWiki's API for this
> information (with user authentication, as necessary).

This is particularly important in that some extensions may have hard-to-evaluate effect on some configuration values (namespaces and usergroups being the more obvious cases).

I should say that any necessary configuration value that cannot be fetched through the API should be /added/ to the API rather than fetched through an alternative scheme.

-- Marc
Comment 6 Platonides 2013-07-12 17:53:32 UTC
API is per wiki. toolserver.wiki is a meta table.
Comment 7 Marc A. Pelletier 2013-07-12 21:33:15 UTC
Yes, but you need to populate that table from /somewhere/.  :-)
Comment 8 Marc A. Pelletier 2013-08-27 20:40:00 UTC
I've added a table with automatically maintained meta information
about the replicated databases: meta_p.wiki (which is available on every
shard).

+------------------+--------------+------+-----+---------+-------+
| Field            | Type         | Null | Key | Default | Extra |
+------------------+--------------+------+-----+---------+-------+
| dbname           | varchar(32)  | NO   | PRI | NULL    |       |
| lang             | varchar(12)  | NO   |     | en      |       |
| name             | text         | YES  |     | NULL    |       |
| family           | text         | YES  |     | NULL    |       |
| url              | text         | YES  |     | NULL    |       |
| size             | decimal(1,0) | NO   |     | 1       |       |
| slice            | text         | NO   |     | NULL    |       |
| is_closed        | decimal(1,0) | NO   |     | 0       |       |
| has_echo         | decimal(1,0) | NO   |     | 0       |       |
| has_flaggedrevs  | decimal(1,0) | NO   |     | 0       |       |
| has_visualeditor | decimal(1,0) | NO   |     | 0       |       |
| has_wikidata     | decimal(1,0) | NO   |     | 0       |       |
+------------------+--------------+------+-----+---------+-------+

There is a lingering issue with the 'name' column which seems to
improperly encode the Wiki name when non-ascii characters are involved;
that will get fix once I manage to beat some sense into mysql.

Most columns are self-explanatory, and I can add a few more depending on
demand.  In the meantime, (dbname, slice) provides the much requested
mapping between databases and slices.
Comment 9 Platonides 2013-08-27 20:44:51 UTC
decimal(1,0) ? This seems strange. Shouldn't those is_* and has_* be BOOL aka. TINYINT(1) ?
Comment 10 Marc A. Pelletier 2013-08-27 20:47:12 UTC
I did not want to rely on the existence of bool, which isn't ANSI; mysql "helpfully" translated my numeric(1) to decimal(1,0).
Comment 11 Platonides 2013-08-27 20:48:10 UTC
Would be a problem to rename slice to server, in order to match the column name of toolserver?

The name column looks good to me from a quick look, btw.
Comment 12 Marc A. Pelletier 2013-08-27 20:50:05 UTC
It would be possible, but probably unhelpful: from what I understand, the server column is numeric whereas I provide actual host names.  Keeping the column named the same with changed semantics seems to be asking for trouble IMO (i.e.: better a select fails than return a string that is misinterpreted as an integer by code with poor error checking).
Comment 13 Marc A. Pelletier 2013-08-28 19:44:07 UTC
Added a meta_p.legacy view that has the same column name and order as toolserver.wiki for legacy purposes.

Please note that the semantics of the 'server' columns differs and there may be other subtle differences with the toolserver's table not immediately evident.  Unless the same code base has to run on both labs and the toolserver for the interval while it still has replication, transitioning to use meta_p.wiki is preferable.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links