Last modified: 2014-11-11 16:38:53 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T71979, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 69979 - hhvm creates core file in /tmp/ filling mediawiki02 labs instance root partition
hhvm creates core file in /tmp/ filling mediawiki02 labs instance root partition
Status: NEW
Product: Wikimedia Labs
Classification: Unclassified
deployment-prep (beta) (Other open bugs)
unspecified
All All
: Normal normal
: ---
Assigned To: Nobody - You can work on this!
: hhvm
Depends on:
Blocks: 69601
  Show dependency treegraph
 
Reported: 2014-08-25 15:05 UTC by Antoine "hashar" Musso (WMF)
Modified: 2014-11-11 16:38 UTC (History)
14 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Antoine "hashar" Musso (WMF) 2014-08-25 15:05:34 UTC
On deployment-mediawiki02:/tmp

-rw------- 1 apache apache 625M Aug 22 22:37 hhvm.29585.core
-rw------- 1 apache apache 641M Aug 22 22:37 hhvm.3112.core
-rw------- 1 apache apache 2.1G Aug 22 21:45 hhvm.25555.core
-rw------- 1 apache apache 2.3G Aug 22 20:56 hhvm.3314.core


That causes / partition to be filled up completely causing a lot of various interesting side effects.

I have deleted them all.
Comment 1 Antoine "hashar" Musso (WMF) 2014-08-25 15:15:14 UTC
The instance has some local disk space allocated under /srv/ (via puppet class role::labs::lvm::srv ).  Would be a nice destination for core files which would be local to the instance and avoid filling the NFS shared disk space.
Comment 2 Bryan Davis 2014-08-25 15:45:31 UTC
The latest production puppet code for setting up hhvm moves the cores to /var/log/hhvm. We need to get deployment-mediawiki02 running puppet again and then probably make /var/log/hhvm a symlink to /data/project/logs/hhvm to ensure that we have lots of space for cores. This is of course only useful if we have someone watching for hhvm crashes and doing something to triage the bugs that cause them.
Comment 3 Chris McMahon 2014-08-25 16:07:36 UTC
multiple HHVM cores per day seems like a real problem
Comment 4 Bryan Davis 2014-08-25 16:16:32 UTC
(In reply to Chris McMahon from comment #3)
> multiple HHVM cores per day seems like a real problem

Likely some new to us hhvm bug. Unfortunately we'll need to wait for it to happen again if the cores are gone now.
Comment 5 Antoine "hashar" Musso (WMF) 2014-08-27 21:39:58 UTC
I have deleted the 2GB+ core files on mediawiki02:/tmp/
Comment 6 Greg Grossmeier 2014-08-27 22:04:59 UTC
Since Bryan and Giuseppe were working on this issue this morning, I'm assigning to Bryan. :)
Comment 7 Bryan Davis 2014-08-27 22:43:45 UTC
Bandaid solution:

$ cat cleanup-hhvm-cores
#!/usr/bin/env bash

sudo mv /tmp/hhvm.*.core /data/project/hhvm-cores
sudo mv /var/log/hhvm/stacktrace* /data/project/hhvm-cores

$ crontab -l
*/2 * * * * /home/bd808/cleanup-hhvm-cores

Applied on both deployment-mediawiki01 and deployment-mediawiki02
Comment 8 Bryan Davis 2014-08-27 23:07:09 UTC
Unlicking this cookie. The core (ha punny) problem remains but hopefully someone on the hhvm team can start triaging from the cores in /data/project/hhvm-cores
Comment 9 Greg Grossmeier 2014-08-27 23:12:18 UTC
Resseting assignee and priority as this is now no longer an OMG! situation.

For the record:
gjg@deployment-bastion:/data/project/hhvm-cores$ ls -al *core | wc -l
12

(between Aug 26 05:13 and Aug 27 22:34 UTC)
Comment 10 Greg Grossmeier 2014-08-28 17:14:25 UTC
Still at 12 since over night. I wonder if the core dumps were caused by the relatively mega high load due to the automated security audit?
Comment 11 Bryan Davis 2014-08-28 17:25:47 UTC
(In reply to Greg Grossmeier from comment #10)
> Still at 12 since over night. I wonder if the core dumps were caused by the
> relatively mega high load due to the automated security audit?

Or the fuzzing hit legit bugs in our php code that trigger hhvm segfaults.
Comment 12 Antoine "hashar" Musso (WMF) 2014-08-29 21:58:21 UTC
Note I originally created the bug because hhvm send cores to /tmp/ which should be configured in the hhvm conf file to point somewhere else, or become configurable (I think the path is hardcoded in hhvm).
Comment 13 Daniel Zahn 2014-08-29 22:14:56 UTC
(In reply to Antoine "hashar" Musso from comment #12)
> Note I originally created the bug because hhvm send cores to /tmp/ which
> should be configured in the hhvm conf file to point somewhere else, or
> become configurable (I think the path is hardcoded in hhvm).

-->  https://gerrit.wikimedia.org/r/#/c/157294/2
Comment 14 Gerrit Notification Bot 2014-08-30 11:49:05 UTC
Change 157294 had a related patch set uploaded by Hashar:
hhvm - make debug path configurable

https://gerrit.wikimedia.org/r/157294
Comment 15 Gerrit Notification Bot 2014-09-03 01:38:24 UTC
Change 157294 abandoned by Dzahn:
hhvm - make debug path configurable

https://gerrit.wikimedia.org/r/157294
Comment 16 Ori Livneh 2014-09-28 00:39:54 UTC
Maybe we should simply disable core dumps by default (by setting the proper limit / sysctl params)? We certainly could on labs.
Comment 17 Marc A. Pelletier 2014-09-28 02:19:46 UTC
FYI, by default, the linux kernel creates core files in the process's CWD.  If you want to retain the core files just not in /tmp, you can give a file pattern (including path) in /proc/sys/kernel/core_pattern
Comment 18 Ori Livneh 2014-09-28 02:32:42 UTC
(In reply to Marc A. Pelletier from comment #17)
> FYI, by default, the linux kernel creates core files in the process's CWD. 
> If you want to retain the core files just not in /tmp, you can give a file
> pattern (including path) in /proc/sys/kernel/core_pattern

Right. That can easily be done in the pre-start stanza of the upstart job. But I don't agree with Antoine and Bryan that we should, in fact, do this. If it's important to retain core files, then let's keep them in /tmp. If the beta cluster app servers don't have enough space in /tmp, then that's the actual bug, and we should fix it by making sure they do.

Gratuitous and unprincipled divergence from production compromises both the beta cluster and production: the beta cluster because its fidelity to production is its very value and purpose, and production because the Puppet change needed to make the divergence possible means adding a useless knob to the manifests. Sometimes it's unavoidable, but I don't think this is one of those times.
Comment 19 Marc A. Pelletier 2014-09-29 13:32:04 UTC
While I can think of a number of good reasons why you'd want to keep cores in a development environment, would you even /want/ to have core dumps in prod at all in the first place?
Comment 20 Ori Livneh 2014-09-29 13:38:51 UTC
(In reply to Marc A. Pelletier from comment #19)
> While I can think of a number of good reasons why you'd want to keep cores
> in a development environment, would you even /want/ to have core dumps in
> prod at all in the first place?

I think I'd be fine with disabling them. We've had bugs before that we couldn't reproduce in our dev environments but we can re-enable core dumps if such a problem manifests again.
Comment 21 Antoine "hashar" Musso (WMF) 2014-11-11 13:15:09 UTC
hhvm on beta cluster now dumps files to /var/tmp/hhvm which is a 2GB partition.  Noticed on deployment-mediawiki01.eqiad.wmflabs.  I have deleted the core file.
Comment 22 Bryan Davis 2014-11-11 16:38:53 UTC
(In reply to Antoine "hashar" Musso (WMF) from comment #21)
> hhvm on beta cluster now dumps files to /var/tmp/hhvm which is a 2GB
> partition.  Noticed on deployment-mediawiki01.eqiad.wmflabs.  I have deleted
> the core file.

I have updated my core file sweeping script for the new location:

  #!/usr/bin/env bash
  
  sudo mv /tmp/hhvm.*.core /data/project/hhvm-cores &>/dev/null
  sudo mv /var/tmp/hhvm/*.core /data/project/hhvm-cores &>/dev/null
  sudo mv /var/log/hhvm/stacktrace* /data/project/hhvm-cores &>/dev/null

This is ~bd808/cleanup-hhvm-cores on any deployment-prep host and croned as my user on deployment-mediawiki0[12].

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links