Last modified: 2013-06-18 15:22:31 UTC
Currently the job queues for Wikimedia wikis can become heavily backlogged without anyone noticing. This is bad. Sometimes it's due to not enough job runners being assigned, other times it's due to software problems, etc. The job queue is quite important to MediaWiki, so having it run is important, as is being notified when the job queue has gotten too backlogged or is broken. A better monitoring and notification system (using mailing lists, IRC, nagios, whatever) needs to be implemented for the job queue. This may relate to bug 27724, though adding a timestamp column is only one way you might implement better monitoring.
Raising this bug priority. This is a real issue.
This is fixed now. There is a Nagios check which checks job queue length on all wikis (and starting today, this check actually works), see http://nagios.wikimedia.org/nagios/cgi-bin/extinfo.cgi?type=2&host=spence&service=check_job_queue . Ganglia also measures the enwiki job queue length: http://ganglia.wikimedia.org/?m=cpu_report&r=hour&s=descending&c=Miscellaneous+pmtpa&h=spence.wikimedia.org&sh=1