Last modified: 2013-08-29 20:13:20 UTC
It's because of NFS outages - this is killing irc bots, mark this bug as resolved once all issues related to NFS are fixed and servers can hold up at least for a month without random outages
Why/how would the filesystem stalling for brief periods make IRC bots die? Reports of what was in the logs show connections to IRC /servers/ timing out or being denied. The problem seems to be on Freenode's side.
I don't know if it's on freenode side or not, but when the servers becomes unusable (thanks to nfs for example) some of bellow happens a) labs-morebots get disconnected (wm-bot doesn't) b) you typically reboot servers after fix of nfs, which kill them anyway c) system become unstable / crash d) irc bot may need to touch a disk, which because every tool must be hosted on nfs block it. Typically binaries that are hosted on nfs, at least when they are CLI are read on demand, not loaded to operating memory as they are, so even the mere execution of program may require read from disk So this is a problem for irc bots as well, even if it doesn't look like that at first sight
This could be probably avoided if bot was using network bouncer which was copied to local filesystem and then started. This sounds a bit complex and doesn't fix the other mentioned issues (like reboot after a fix)
This has been rendered moot by the NFS server no longer stalling.