Last modified: 2014-04-20 15:35:42 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T56934, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 54934 - Wikimedia Labs database replication has seemingly stopped (s1 and s2?)


Summary:	Wikimedia Labs database replication has seemingly stopped (s1 and s2?)

Status:	RESOLVED FIXED

Product:	Wikimedia Labs
Classification:	Unclassified
Component:	tools (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal major
Target Milestone:	---
Assigned To:	Marc A. Pelletier

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:
	Show dependency tree / graph

Reported:	2013-10-03 20:11 UTC by Liangent
Modified:	2014-04-20 15:35 UTC (History)
CC List:	7 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Liangent 2013-10-03 20:11:43 UTC

max(rc_timestamp) are usually around 20131003170000 for wikis there ( https://noc.wikimedia.org/conf/s2.dblist ).

Other wikis seem fine.

Comment 1 Betacommand 2013-10-06 02:19:40 UTC

Has spread to at least enwiki

Comment 2 MZMcBride 2013-10-06 03:00:03 UTC

Confirmed the issue:

MariaDB [zhwiki_p]> select max(rc_timestamp) from recentchanges\G
*************************** 1. row ***************************
max(rc_timestamp): 20131003170159
1 row in set (0.04 sec)

MariaDB [enwiki_p]> select max(rc_timestamp) from recentchanges\G
*************************** 1. row ***************************
max(rc_timestamp): 20131004074947
1 row in set (0.03 sec)

It seems database replication is broken. Is replication lag logged/graphed anywhere?

Copying Sean and Ryan L. here. I think Asher previously worked on Labs' database replication, but he's gone. I'm not sure who the new maintainer is.

Comment 3 Sean Pringle 2013-10-07 01:41:08 UTC

The relevant sanitarium (upstream) replication had stopped due to a lock wait timeout caused by a slow audit process. The issue has been fixed and labsdbs should catch up quickly.

Also found the icinga replication check for our mysql_multi_instance class in puppet is unreliable. Switching it over to the pt-heartbeat method used by the core dbs...

Comment 4 Sean Pringle 2013-10-07 02:46:26 UTC

jeremyb pointed out in IRC that I missed the question on replag.

Replag graph mysql_slave_lag is not setup for the sanitarium hosts. It can be done as part of the same general fix I mentioned in comment #3.

Don't know the ganglia situation on labsdb. Marc might. FWIW a replag graph on labs in this case would not have showed anything as the problem was upstream. Something graphing replication rate, rather than lag, would have been useful.

Comment 5 Liangent 2013-10-07 02:52:46 UTC

(In reply to comment #4)
> FWIW a replag graph on
> labs in this case would not have showed anything as the problem was upstream.
> Something graphing replication rate, rather than lag, would have been useful.

For DBA's view, this is true; for practical view, a graph of the difference between the latest recentchange entry's timestamp and the current timestamp would be useful enough, assuming there're always edits happening on the wiki.

Comment 6 jeremyb 2013-10-07 03:01:23 UTC

(In reply to comment #5)
> a graph of the difference
> between the latest recentchange entry's timestamp and the current timestamp
> would be useful enough, assuming there're always edits happening on the wiki.

We can probably do better than that. There's a heartbeat DB visible (at least on enwiki.labsdb) and we can probably open that up for everyone to read and graph it.

Comment 7 Betacommand 2014-04-20 12:10:12 UTC

enwiki replication is over two days behind

Comment 8 Tim Landscheidt 2014-04-20 14:01:50 UTC

(In reply to Betacommand from comment #7)
> enwiki replication is over two days behind

As this is a different issue, I've filed bug #64154 for that.

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links