Last modified: 2013-08-21 22:25:26 UTC
I don't know if this is related to the NFS stalls, but tools-webserver-01 runs out of memory from time to time; exim paniclog: | 2013-08-03 03:13:23 daemon: fork of queue-runner process failed: Cannot allocate memory Daily anacron, Sun, 04 Aug 2013 06:30:12 +0000: | /etc/cron.daily/apt: | FATAL -> Failed to fork. Weekly anacron, Sun, 04 Aug 2013 06:47:25 +0000: | /etc/cron.weekly/apt-xapian-index: | FATAL -> Failed to fork. | run-parts: /etc/cron.weekly/apt-xapian-index exited with return code 100 Ganglia graphs (http://ganglia.wmflabs.org/latest/graph_all_periods.php?h=tools-webserver-01&m=load_one&r=hour&s=by%20name&hc=4&mc=2&st=1375632437&g=mem_report&z=large&c=tools) look rather peaceful, with most of the memory only being used for buffers/cache. But the webserver should never impede the system jobs from running, so this needs to be looked into. Setting up tools-webserver-03 is certainly an option, but may only defer the problem.
exim paniclog again (deleted afterwards by me): | 2013-08-13 12:53:23 daemon: fork of queue-runner process failed: Cannot allocate memory | 2013-08-16 12:52:33 daemon: fork of queue-runner process failed: Cannot allocate memory
Added a new webserver to the rotation, this should ease the pressure.