Last modified: 2013-11-12 15:17:50 UTC
https://gerrit.wikimedia.org/r/#/c/90739/ was merged, but isn't showing up in https://git.wikimedia.org/summary/mediawiki%2Fextensions%2FMassMessage.git nor https://github.com/wikimedia/mediawiki-extensions-MassMessage/commits/master Also, https://git.wikimedia.org/ says "there has been no activity today" (false), and the active repositories sidebar is empty
Hmm, no problems replicating to lanthanum, just everything else :\ We're getting rejected host key errors in the logs. All the broken sites have valid fingerprints in known_hosts, and ssh'ing manually to the boxes works fine.
A side effect in Jenkins is that we use the locate replication for extensions jobs. That is used to installed mediawiki/core@master as well as potential extensions dependencies. Can lead to some crazy build failures.
On gallium auth log, I see rejected connection from ytterbium.wikimedia.org [208.80.154.80] since Oct 19 20:55 UTC The last one working: Oct 19 20:46:27 Set /proc/self/oom_score_adj to 0 Connection from 208.80.154.80 port 44711 Found matching RSA key: /// Postponed publickey for gerritslave from 208.80.154.80 port 44711 ssh2 [preauth] Found matching RSA key: /// Accepted publickey for gerritslave from 208.80.154.80 port 44711 ssh2 pam_unix(sshd:session): session opened for user gerritslave by (uid=0) User child is on pid 30532 pam_unix(sshd:session): session closed for user gerritslave The first one failing: Oct 19 20:56:30 Connection from 208.80.154.80 port 45384 Received disconnect from 208.80.154.80: 3: com.jcraft.jsch.JSchException: reject HostKey: gallium.wikimedia.org [preauth] Rest of the auth log is filled with such errors.
October 19th: 20:54 ^d: gerrit: installed 2.7-rc2-507-g1e7090b, service back up Seems the upgrade did not went well and broke something. Maybe replication is run by a different username that does not has gallium.wikimedia.org added to known_hosts.
The same issue appear on lanthanum.eqiad.wmnet and might be happening on antimony.wikimedia.org as well.
(In reply to comment #4) > October 19th: > > 20:54 ^d: gerrit: installed 2.7-rc2-507-g1e7090b, service back up > > Seems the upgrade did not went well and broke something. Maybe replication > is > run by a different username that does not has gallium.wikimedia.org added to > known_hosts. Upgrade didn't touch replication, it only added a minor change to the output format of `gerrit query.` gerrit has always read /var/lib/gerrit2/.ssh/known_hosts, which hasn't changed since the move to ytterbium. (In reply to comment #5) > The same issue appear on lanthanum.eqiad.wmnet and might be happening on > antimony.wikimedia.org as well. lanthanum is replicating fine, it's antimony/gallium/github that are funky like I mentioned above.
This turned out to be an installation issue. For some reason gerrit user's homedir was at /home/gerrit2 instead of /var/lib/gerrit2. For now i just copied the files and restarted gerrit2, but I will fix it cleanly, moving the homedir in /var/lib/gerrit2 and deleting /home/gerrit2
Bah, this is my fault. I'll clean it up.