Last modified: 2014-05-23 18:54:46 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T66683, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 64683 - /var is full on tools-webgrid-01 due to me spamming /var/log/auth.log with sudo
/var is full on tools-webgrid-01 due to me spamming /var/log/auth.log with sudo
Status: RESOLVED FIXED
Product: Wikimedia Labs
Classification: Unclassified
tools (Other open bugs)
unspecified
All All
: Lowest blocker
: ---
Assigned To: Tim Landscheidt
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2014-04-30 21:34 UTC by Tim Landscheidt
Modified: 2014-05-23 18:54 UTC (History)
4 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Tim Landscheidt 2014-04-30 21:34:24 UTC

    
Comment 1 Tim Landscheidt 2014-04-30 22:14:54 UTC
For the time until we fix bug #61102, I have installed a script /home/scfc/bin/cleanup-php-cgis per crontab on tools-login to kill orphaned php-cgi processes on tools-webgrid-01 and tools-webgrid-02.

During its development on April 27th I had started a faulty version of it that called "sudo kill -HUP" ad infinitum on the webnodes even when there were no php-cgi processes to kill, adding about 4 KByte/s to /var/log/auth.log, thus filling up /var.

The correct version installed per crontab only logs about 1 KByte/5 minutes (ssh connect from tools-login to tools-webgrid-01/tools-webgrid-02).

There was a sparkle where I could have noted the error as my installed script sometimes complained about processes disappearing between detection and killing which I assumed was the odd correct php-cgi shutdown, but in reality apparently was just a race condition between the competing scripts.

I've inspected tools-login, tools-webgrid-01 and tools-webgrid-02 for any ancient processes, and there are now none.  Also, I moved /var/log/auth.log to /data/project/admin/auth.log.scfc.bz2 and "stop rsyslogd && start rsyslogd" to get tools-webgrid-01 going again.

/var/log/auth.log would normally be kept for about four weeks, so I'll leave this bug open to either remove /data/project/admin/auth.log.scfc.bz2 in a month or braid it back into the logrotate process in two weeks when it would normally be compressed as well.
Comment 2 Tim Landscheidt 2014-05-23 18:54:46 UTC
I've now moved auth.log.4.gz to auth.log.5.gz and /data/project/admin/auth.log.scfc.bz2 (re-compressed) to auth.log.4.gz.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links