Last modified: 2014-09-23 20:59:26 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 15441 - Some tables lack unique or primary keys, may allow confusing duplicate data
Some tables lack unique or primary keys, may allow confusing duplicate data
Status: NEW
Product: MediaWiki
Classification: Unclassified
Database (Other open bugs)
unspecified
All All
: Normal normal (vote)
: ---
Assigned To: Sean Pringle
: schema-change
: 33228 (view as bug list)
Depends on: 71198 33638
Blocks: postgres 65264
  Show dependency treegraph
 
Reported: 2008-09-02 15:21 UTC by Fabien Coelho
Modified: 2014-09-23 20:59 UTC (History)
10 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Fabien Coelho 2008-09-02 15:21:28 UTC
It seems that some tables do not have primary keys nor any unique attributes declared:

 - 18 with PostgreSQL [categorylinks externallinks hitcounter imagelinks langlinks mediawiki_version oldimage pagelinks profiling protected_titles querycache querycachetwo redirect templatelinks user_groups user_newtalk watchlist]

 -  7 with MySQL [archive externallinks hitcounter oldimage querycache querycachetwo user_newtalk]).

I'm pretty sure that they must exist, and declaring them could help trigger
latent bugs in the software, as the database would check for the constraints.
Comment 1 Brion Vibber 2008-10-27 06:25:20 UTC
Clarifying summary.
Comment 2 Sam Reed (reedy) 2012-01-10 14:23:49 UTC
No PK (but has unique index):
categorylinks
change_tag
imagelinks
interwiki
iwlinks
langlinks
log_search
module_deps
msg_resource
msg_resource_links
page_props
oagelinks
protected_titles
querycache_info
searchindex
site_stats
tag_summary
templatelinks
transcache
user_former_groups
user_groups

No PK or unique index:
archive
externallinks
hitcounter
l10n_cache
oldimage
querycache
querycachetwo
user_newtalk
Comment 3 Sam Reed (reedy) 2012-01-10 15:23:17 UTC
*** Bug 33228 has been marked as a duplicate of this bug. ***
Comment 4 Joerg 2012-01-12 15:43:31 UTC
It is important that really Primary Keys are used and not just any index.
Otherwise you might get into trouble trying to edit table data. Programmes like MySQL Workbench do not allow you to edit table data, if the table does not have a Primary Key.

I spoke to domas yesterday and as far as missing PKs cause problems he proposed to convert keys to Primary Keys to fix this issue.
Comment 5 Sam Reed (reedy) 2012-01-12 17:47:23 UTC
(In reply to comment #4)
> It is important that really Primary Keys are used and not just any index.
> Otherwise you might get into trouble trying to edit table data. Programmes like
> MySQL Workbench do not allow you to edit table data, if the table does not have
> a Primary Key.
> 
> I spoke to domas yesterday and as far as missing PKs cause problems he proposed
> to convert keys to Primary Keys to fix this issue.

Cool, thanks for checking. Did he advise if there was a nice way to convert between them? Other than dropping the index, and recreating it as a PK?

Obviously fixing the ones with no PK or unique key is more easily done
Comment 6 Joerg 2012-01-12 19:36:55 UTC
No, he did not. Basically what I wrote above is everything he told me.
Comment 7 Sam Reed (reedy) 2012-01-12 23:51:47 UTC
Fancy commenting Domas?

Is there a nice way to convert a unique key to a PK without having to drop and re-add it?
Comment 8 Fabien Coelho 2012-01-13 10:11:49 UTC
(In reply to comment #5)
> Obviously fixing the ones with no PK or unique key is more easily done

ISTM that you are a little bit optimistic here. If the data do not really comply to a key (say there is a bug in the application which did not respect an implicit key, and the key was not checked by the database because the constraint was not declared), then the addition of the constraint will fail.
Comment 9 Fabien Coelho 2012-01-13 10:20:33 UTC
(In reply to comment #7)

> Is there a nice way to convert a unique key to a PK without having to drop and
> re-add it?

AFAIK, not for PostgreSQL.

For the order, it seems safer to add first the PK constraint and then to drop the UNIQUE, rather than the other way around.

A PK is a UNIQUE + NOT NULL. You must hope that the data do not use a NULL for some reason.
Comment 10 Joerg 2012-01-13 13:41:58 UTC
I could not find anything on changing an existing index into a PK in the MySQL docs.

Also professional DB Tools like MySQL GUI Tools, MySQL Workbench (and also phpMyAdmin) just DROP the old index and ADD the new Primary Key (also when you do both at the same time).

This is an according example output:
ALTER TABLE `x` 
ADD PRIMARY KEY (`field`), 
DROP INDEX `field`;
Comment 11 Sam Reed (reedy) 2012-01-13 17:02:36 UTC
(In reply to comment #10)
> 
> This is an according example output:
> ALTER TABLE `x` 
> ADD PRIMARY KEY (`field`), 
> DROP INDEX `field`;

On that note, someone was going to log a bug about that.

When we do them as individual statements (ie alter table drop, then alter table add), we actually are creating more overhead as mysql rebuilds the table twice.

It's done for SQLite comparability, but the "overhead" of having a file each is probably worth it

(In reply to comment #8)
> (In reply to comment #5)
> > Obviously fixing the ones with no PK or unique key is more easily done
> 
> ISTM that you are a little bit optimistic here. If the data do not really
> comply to a key (say there is a bug in the application which did not respect an
> implicit key, and the key was not checked by the database because the
> constraint was not declared), then the addition of the constraint will fail.

The generic fallback is then to just add a PK of an int per row, which MySQL has already done internally anyway (or something along those lines)
Comment 12 Tim Landscheidt 2012-01-14 00:33:33 UTC
(In reply to comment #9)
> > Is there a nice way to convert a unique key to a PK without having to drop and
> > re-add it?

> AFAIK, not for PostgreSQL.

There seems to be a way described at http://archives.postgresql.org/pgsql-general/2004-12/msg01161.php.  On any database smaller than WMF's, I would opt for DROP + ADD, though.

> For the order, it seems safer to add first the PK constraint and then to drop
> the UNIQUE, rather than the other way around.

> A PK is a UNIQUE + NOT NULL. You must hope that the data do not use a NULL for
> some reason.

Well, you can also test that :-).
Comment 13 Fabien Coelho 2012-01-14 06:52:36 UTC
(In reply to comment #12)

> There seems to be a way described at
> http://archives.postgresql.org/pgsql-general/2004-12/msg01161.php.  On any
> database smaller than WMF's, I would opt for DROP + ADD, though.

Indeed, that is a fun one, just updating the metadata!

  CREATE TABLE foo(id INT UNIQUE NOT NULL, data TEXT);
  -- ...
  UPDATE pg_index SET indisprimary=TRUE WHERE indexrelid=106702;
  UPDATE pg_constraint SET contype='p' WHERE conname='foo_id_key';
  # \d+ foo
    "foo_id_key" PRIMARY KEY, btree (id)

However, you may need an admin to do that, while the add/drop would only require to own the table.

> Well, you can also test that :-).

Sure.
Comment 14 Domas Mituzas 2012-01-15 14:15:41 UTC
small note: we don't use workbench to edit data in WMF deployment, and we shouldn't ;-)
Comment 15 Joerg 2012-01-15 23:01:52 UTC
Workbench was just an example. I know that you prefer the command line. ;-)
Comment 16 Asher Feldman 2012-05-16 18:00:41 UTC
(In reply to comment #2)
The following tables in enwiki don't have a primary or unique key defined:

mysql> select table_schema,table_name from information_schema.columns where table_schema='enwiki' group by table_name having sum(if(column_key in ('PRI','UNI'), 1,0)) = 0;
+--------------+--------------------------------+
| table_schema | table_name                     |
+--------------+--------------------------------+
| enwiki       | archive                        |
| enwiki       | archive_old                    |
| enwiki       | change_tag                     |
| enwiki       | click_tracking                 |
| enwiki       | click_tracking_user_properties |
| enwiki       | edit_page_tracking             |
| enwiki       | exarchive                      |
| enwiki       | exlogging                      |
| enwiki       | externallinks                  |
| enwiki       | hashs                          |
| enwiki       | hidden                         |
| enwiki       | hitcounter                     |
| enwiki       | l10n_cache                     |
| enwiki       | logging_old                    |
| enwiki       | long_run_profiling             |
| enwiki       | namespaces                     |
| enwiki       | oldimage                       |
| enwiki       | oldimage_old                   |
| enwiki       | old_growth                     |
| enwiki       | querycache                     |
| enwiki       | querycachetwo                  |
| enwiki       | querycache_old                 |
| enwiki       | securepoll_lists               |
| enwiki       | user_newtalk                   |
| enwiki       | user_properties                |
| enwiki       | validate                       |
+--------------+--------------------------------+
26 rows in set (0.03 sec)

This is especially vexing for the large tables (archive, externallinks) which I'd like to be able to perform online schema migrations against.
Comment 17 Nathan Larson 2012-11-01 00:10:25 UTC
Gerrit change #Iafdee06e will add a PK for archive and externallinks. Is there any reason not to also add PKs for:
* change_tag
* hitcounter
* l10n_cache
* oldimage
* querycache
* querycachetwo
* user_newtalk
* user_properties
Comment 18 Tyler Romeo 2013-08-17 17:50:27 UTC
If this is being actively worked on, I'd like to note one other thing that should probably be changed along with this: a lot of our tables have primary keys, but also don't at the same time. Basically, the maintenance/tables.sql file doesn't list the table as having a primary key, but MySQL behavior is to choose the first non-null unique key as use it as a primary key anyway. In other words, the tables don't *look* like they have primary keys in their definitions, but they actually do when created. It'd be simpler and less misleading to just explicitly declare the primary keys.

Also, I'd like to note one thing about primary keys with InnoDB: all tables have primary keys. If you don't specify one, as I said, MySQL will try and choose a unique non-null index. But even if there is none of those, it will create it's own "hidden primary key" using a custom 6-byte row id. Unfortunately, this is really slow because the value for the next row id is not per-table, but per-database. In other words, when inserting into any table without an actual primary key, MySQL has to lock out all other similar tables first in order to get a row id.
Comment 19 Domas Mituzas 2013-08-20 17:27:30 UTC
There are no global/database/table locking implications with hidden row key. It does not have to be synchronized with any other subsystem, so the cost of row-id is zero. Adding auto-increment key would increase locking contention, not decrease it.
Comment 20 Tyler Romeo 2013-08-20 17:29:05 UTC
(In reply to comment #19)
> There are no global/database/table locking implications with hidden row key.
> It
> does not have to be synchronized with any other subsystem, so the cost of
> row-id is zero. Adding auto-increment key would increase locking contention,
> not decrease it.

http://blog.jcole.us/2013/05/02/how-does-innodb-behave-without-a-primary-key/

Have things changed since May 2?
Comment 21 Domas Mituzas 2013-08-20 17:38:47 UTC
Oh, with that thinking I can point out fifty other global locks that are touched when executing queries. The question is for what duration and are there truly any locking implications there or do they apply to a running system.
Comment 22 Gerrit Notification Bot 2013-10-17 17:50:26 UTC
Change 51675 merged by jenkins-bot:
Add archive, externallinks PK

https://gerrit.wikimedia.org/r/51675
Comment 23 Gerrit Notification Bot 2013-10-28 22:11:11 UTC
Change 92433 had a related patch set uploaded by saper:
Add ar_id and el_id sequences for PostgreSQL

https://gerrit.wikimedia.org/r/92433
Comment 24 Gerrit Notification Bot 2013-11-17 21:36:19 UTC
Change 92433 merged by jenkins-bot:
Add ar_id and el_id sequences for PostgreSQL

https://gerrit.wikimedia.org/r/92433
Comment 25 Andre Klapper 2014-02-14 13:13:02 UTC
[No patches to merge left - resetting bug status]

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links