Last modified: 2014-07-23 20:01:22 UTC
Since around 2014-07-22 20:00 UTC labs URLS like http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page have been failing for me. Before that I would see Flow pages but get the database locked error trying to add content. http://en.wikipedia.beta.wmflabs.org/w/api.php either times out or takes a long time. A basic load.php works fine, but a complex http://en.wikipedia.beta.wmflabs.org/w/load.php?debug=true&lang=en&modules=ext.flow.new&skin=vector&version=20140524T012822Z&* bd808 reported some successes on Special:Version, so it's varying.
The apache error logs (/data/project/logs/apache-error.log) show quite a few errors from proxy_fcgi: [Tue Jul 22 21:59:53.156019 2014] [proxy_fcgi:error] [pid 21637] [client 10.68.16.16:50309] AH01067: Failed to read FastCGI header [Tue Jul 22 21:59:53.156796 2014] [proxy_fcgi:error] [pid 21637] (70014)End of file found: [client 10.68.16.16:50309] AH01075: Error dispatching request to : [Tue Jul 22 21:59:53.194949 2014] [proxy:error] [pid 21626] (111)Connection refused: AH00957: FCGI: attempt to connect to 127.0.0.1:9000 (*) failed [Tue Jul 22 21:59:53.195038 2014] [proxy_fcgi:error] [pid 21626] [client 10.68.16.12:7820] AH01079: failed to make connection to backend: 127.0.0.1 The /tmp directories on deployment-mediawiki01 and deployment-mediawiki02 are full of hhvm crash stack traces.
Most of the crashes I'm seeing right now are for bug 68413.
Resolved by https://gerrit.wikimedia.org/r/#/c/148743/ and https://gerrit.wikimedia.org/r/#/c/148754/ . (That isn't to say that we've resolved all availability or performance issues on Labs.)