Last modified: 2014-04-20 15:35:42 UTC
max(rc_timestamp) are usually around 20131003170000 for wikis there ( https://noc.wikimedia.org/conf/s2.dblist ). Other wikis seem fine.
Has spread to at least enwiki
Confirmed the issue: MariaDB [zhwiki_p]> select max(rc_timestamp) from recentchanges\G *************************** 1. row *************************** max(rc_timestamp): 20131003170159 1 row in set (0.04 sec) MariaDB [enwiki_p]> select max(rc_timestamp) from recentchanges\G *************************** 1. row *************************** max(rc_timestamp): 20131004074947 1 row in set (0.03 sec) It seems database replication is broken. Is replication lag logged/graphed anywhere? Copying Sean and Ryan L. here. I think Asher previously worked on Labs' database replication, but he's gone. I'm not sure who the new maintainer is.
The relevant sanitarium (upstream) replication had stopped due to a lock wait timeout caused by a slow audit process. The issue has been fixed and labsdbs should catch up quickly. Also found the icinga replication check for our mysql_multi_instance class in puppet is unreliable. Switching it over to the pt-heartbeat method used by the core dbs...
jeremyb pointed out in IRC that I missed the question on replag. Replag graph mysql_slave_lag is not setup for the sanitarium hosts. It can be done as part of the same general fix I mentioned in comment #3. Don't know the ganglia situation on labsdb. Marc might. FWIW a replag graph on labs in this case would not have showed anything as the problem was upstream. Something graphing replication rate, rather than lag, would have been useful.
(In reply to comment #4) > FWIW a replag graph on > labs in this case would not have showed anything as the problem was upstream. > Something graphing replication rate, rather than lag, would have been useful. For DBA's view, this is true; for practical view, a graph of the difference between the latest recentchange entry's timestamp and the current timestamp would be useful enough, assuming there're always edits happening on the wiki.
(In reply to comment #5) > a graph of the difference > between the latest recentchange entry's timestamp and the current timestamp > would be useful enough, assuming there're always edits happening on the wiki. We can probably do better than that. There's a heartbeat DB visible (at least on enwiki.labsdb) and we can probably open that up for everyone to read and graph it.
enwiki replication is over two days behind
(In reply to Betacommand from comment #7) > enwiki replication is over two days behind As this is a different issue, I've filed bug #64154 for that.