Last modified: 2014-05-28 13:29:51 UTC
Replication for enwiki stopped about two days ago: | MariaDB [enwiki_p]> SELECT MAX(rc_timestamp) FROM recentchanges; | +-------------------+ | | MAX(rc_timestamp) | | +-------------------+ | | 20140418081351 | | +-------------------+ | 1 row in set (0.01 sec) | MariaDB [enwiki_p]> Coren wrote in http://permalink.gmane.org/gmane.org.wikimedia.labs/2336: | > Taking a look enwiki_p is at 1 day, 8:06:02 lag. I think its probably | > due to someone having a broken request. | > I know Coren will end up killing it, but it would be useful to know who | > is causing these issues. | Not this time; there were some system control statements issued in prod | that cannot work on the replicas that have stalled the replication | timeline. This will need a bit of tender loving care from our DBA.
labsdb1001 was stopped on a DROP USER statement where the upstream user did not exist locally. The statement has been skipped and replication is catching up. Two related issues: 1. labsdb* replication is not using --repl-wild-ignore-tables=mysql.% and probably should. 2. The /usr/lib/nagios/percona/check_mysql_slave_running script is broken on labsdb* because it's passed a mysql socket argument that is ignored, making the connection fail (and for some reason that outcome doesn't count as critical...wtf)
Replication for enwiki seems to have stopped again: | MariaDB [enwiki_p]> SELECT MAX(rc_timestamp) FROM recentchanges; | +-------------------+ | | MAX(rc_timestamp) | | +-------------------+ | | 20140505180410 | | +-------------------+ | 1 row in set (0.00 sec) | MariaDB [enwiki_p]> Is this related or a different issue?
This blocked replication: ---TRANSACTION D3668010, ACTIVE 28563 sec fetching rows mysql tables in use 3, locked 3 132316 lock struct(s), heap size 13384120, 1331972 row lock(s), undo log entries 29621 MySQL thread id 61759507, OS thread handle 0x7f698bf66700, query id 1194334863 10.68.1 DELETE FROM temp WHERE pid IN ( SELECT /* SLOW_OK LIMIT:2000 NM */ /* CATSCAN2 */ DIST ... TOO MANY LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS Coren killed it.