Last modified: 2014-11-11 16:38:53 UTC
On deployment-mediawiki02:/tmp -rw------- 1 apache apache 625M Aug 22 22:37 hhvm.29585.core -rw------- 1 apache apache 641M Aug 22 22:37 hhvm.3112.core -rw------- 1 apache apache 2.1G Aug 22 21:45 hhvm.25555.core -rw------- 1 apache apache 2.3G Aug 22 20:56 hhvm.3314.core That causes / partition to be filled up completely causing a lot of various interesting side effects. I have deleted them all.
The instance has some local disk space allocated under /srv/ (via puppet class role::labs::lvm::srv ). Would be a nice destination for core files which would be local to the instance and avoid filling the NFS shared disk space.
The latest production puppet code for setting up hhvm moves the cores to /var/log/hhvm. We need to get deployment-mediawiki02 running puppet again and then probably make /var/log/hhvm a symlink to /data/project/logs/hhvm to ensure that we have lots of space for cores. This is of course only useful if we have someone watching for hhvm crashes and doing something to triage the bugs that cause them.
multiple HHVM cores per day seems like a real problem
(In reply to Chris McMahon from comment #3) > multiple HHVM cores per day seems like a real problem Likely some new to us hhvm bug. Unfortunately we'll need to wait for it to happen again if the cores are gone now.
I have deleted the 2GB+ core files on mediawiki02:/tmp/
Since Bryan and Giuseppe were working on this issue this morning, I'm assigning to Bryan. :)
Bandaid solution: $ cat cleanup-hhvm-cores #!/usr/bin/env bash sudo mv /tmp/hhvm.*.core /data/project/hhvm-cores sudo mv /var/log/hhvm/stacktrace* /data/project/hhvm-cores $ crontab -l */2 * * * * /home/bd808/cleanup-hhvm-cores Applied on both deployment-mediawiki01 and deployment-mediawiki02
Unlicking this cookie. The core (ha punny) problem remains but hopefully someone on the hhvm team can start triaging from the cores in /data/project/hhvm-cores
Resseting assignee and priority as this is now no longer an OMG! situation. For the record: gjg@deployment-bastion:/data/project/hhvm-cores$ ls -al *core | wc -l 12 (between Aug 26 05:13 and Aug 27 22:34 UTC)
Still at 12 since over night. I wonder if the core dumps were caused by the relatively mega high load due to the automated security audit?
(In reply to Greg Grossmeier from comment #10) > Still at 12 since over night. I wonder if the core dumps were caused by the > relatively mega high load due to the automated security audit? Or the fuzzing hit legit bugs in our php code that trigger hhvm segfaults.
Note I originally created the bug because hhvm send cores to /tmp/ which should be configured in the hhvm conf file to point somewhere else, or become configurable (I think the path is hardcoded in hhvm).
(In reply to Antoine "hashar" Musso from comment #12) > Note I originally created the bug because hhvm send cores to /tmp/ which > should be configured in the hhvm conf file to point somewhere else, or > become configurable (I think the path is hardcoded in hhvm). --> https://gerrit.wikimedia.org/r/#/c/157294/2
Change 157294 had a related patch set uploaded by Hashar: hhvm - make debug path configurable https://gerrit.wikimedia.org/r/157294
Change 157294 abandoned by Dzahn: hhvm - make debug path configurable https://gerrit.wikimedia.org/r/157294
Maybe we should simply disable core dumps by default (by setting the proper limit / sysctl params)? We certainly could on labs.
FYI, by default, the linux kernel creates core files in the process's CWD. If you want to retain the core files just not in /tmp, you can give a file pattern (including path) in /proc/sys/kernel/core_pattern
(In reply to Marc A. Pelletier from comment #17) > FYI, by default, the linux kernel creates core files in the process's CWD. > If you want to retain the core files just not in /tmp, you can give a file > pattern (including path) in /proc/sys/kernel/core_pattern Right. That can easily be done in the pre-start stanza of the upstart job. But I don't agree with Antoine and Bryan that we should, in fact, do this. If it's important to retain core files, then let's keep them in /tmp. If the beta cluster app servers don't have enough space in /tmp, then that's the actual bug, and we should fix it by making sure they do. Gratuitous and unprincipled divergence from production compromises both the beta cluster and production: the beta cluster because its fidelity to production is its very value and purpose, and production because the Puppet change needed to make the divergence possible means adding a useless knob to the manifests. Sometimes it's unavoidable, but I don't think this is one of those times.
While I can think of a number of good reasons why you'd want to keep cores in a development environment, would you even /want/ to have core dumps in prod at all in the first place?
(In reply to Marc A. Pelletier from comment #19) > While I can think of a number of good reasons why you'd want to keep cores > in a development environment, would you even /want/ to have core dumps in > prod at all in the first place? I think I'd be fine with disabling them. We've had bugs before that we couldn't reproduce in our dev environments but we can re-enable core dumps if such a problem manifests again.
hhvm on beta cluster now dumps files to /var/tmp/hhvm which is a 2GB partition. Noticed on deployment-mediawiki01.eqiad.wmflabs. I have deleted the core file.
(In reply to Antoine "hashar" Musso (WMF) from comment #21) > hhvm on beta cluster now dumps files to /var/tmp/hhvm which is a 2GB > partition. Noticed on deployment-mediawiki01.eqiad.wmflabs. I have deleted > the core file. I have updated my core file sweeping script for the new location: #!/usr/bin/env bash sudo mv /tmp/hhvm.*.core /data/project/hhvm-cores &>/dev/null sudo mv /var/tmp/hhvm/*.core /data/project/hhvm-cores &>/dev/null sudo mv /var/log/hhvm/stacktrace* /data/project/hhvm-cores &>/dev/null This is ~bd808/cleanup-hhvm-cores on any deployment-prep host and croned as my user on deployment-mediawiki0[12].