Last modified: 2014-07-23 12:58:48 UTC
I have a daily cronjob that pipes a status report to /usr/sbin/exim -v -odf -i. This evening, I noticed that the job was submitted (job 2268567) but the email didn't go through. Seeing that it was scheduled on tools-exec-11, I did a little testing and found these results on -11, -12, and -13: tools.anomiebot@tools-exec-13:~$ (echo "Subject: Test 13"; echo ""; echo "Testing email") | /usr/sbin/exim -v -odf -i redacted@email.address LOG: MAIN <= tools.anomiebot@tools.wmflabs.org U=tools.anomiebot P=local S=414 LOG: PANIC DIE Cannot open main log file "/var/log/exim4/mainlog": Permission denied: euid=107 egid=110 2014-07-12 01:49:22 1X5mRG-0006xN-6q <= tools.anomiebot@tools.wmflabs.org U=tools.anomiebot P=local S=414 2014-07-12 01:49:22 1X5mRG-0006xN-6q Cannot open main log file "/var/log/exim4/mainlog": Permission denied: euid=107 egid=110 exim: could not open panic log - aborting: see message(s) above
My testing showed that mails do get through; but I created /var/log/exim4 with 2750 and Debian-exim:adm anyhow on -11, -12 and -13. I'll leave this bug open to investigate *why* the directory wasn't created in the first place. On hosts where the directory existed, "dpkg -S /var/log/exim4" said that it didn't belong to any package which feels odd to a Fedora user :-).
Okay, the culprit has been found: role::labs::lvm::biglogs hid the "original" /var/log where inter alia exim4 had created its directory. This causes several other packages to croak as well. I'll grid-disable -11 and -13, reschedule the jobs, move the files around and reboot to get to a consistent state again.
-11 is done, on -13 runs a job (2281968) that based on past runs could take up to another 12 hours to complete.