Last modified: 2014-11-11 16:38:53 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T71979, the corresponding Phabricator task for complete and up-to-date bug report information.

Bug 69979 - hhvm creates core file in /tmp/ filling mediawiki02 labs instance root partition


Summary:	hhvm creates core file in /tmp/ filling mediawiki02 labs instance root partition

Status:	NEW

Product:	Wikimedia Labs
Classification:	Unclassified
Component:	deployment-prep (beta) (Other open bugs)
Version:	unspecified
Hardware:	All All

Importance:	Normal normal
Target Milestone:	---
Assigned To:	Nobody - You can work on this!

URL:
Whiteboard:
Keywords:	hhvm

Depends on:
Blocks:	69601
	Show dependency tree / graph

Reported:	2014-08-25 15:05 UTC by Antoine "hashar" Musso (WMF)
Modified:	2014-11-11 16:38 UTC (History)
CC List:	14 users (show)

See Also:
Web browser:	---
Mobile Platform:	---
Assignee Huggle Beta Tester:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Description Antoine "hashar" Musso (WMF) 2014-08-25 15:05:34 UTC

On deployment-mediawiki02:/tmp

-rw------- 1 apache apache 625M Aug 22 22:37 hhvm.29585.core
-rw------- 1 apache apache 641M Aug 22 22:37 hhvm.3112.core
-rw------- 1 apache apache 2.1G Aug 22 21:45 hhvm.25555.core
-rw------- 1 apache apache 2.3G Aug 22 20:56 hhvm.3314.core


That causes / partition to be filled up completely causing a lot of various interesting side effects.

I have deleted them all.

Comment 1 Antoine "hashar" Musso (WMF) 2014-08-25 15:15:14 UTC

The instance has some local disk space allocated under /srv/ (via puppet class role::labs::lvm::srv ).  Would be a nice destination for core files which would be local to the instance and avoid filling the NFS shared disk space.

Comment 2 Bryan Davis 2014-08-25 15:45:31 UTC

The latest production puppet code for setting up hhvm moves the cores to /var/log/hhvm. We need to get deployment-mediawiki02 running puppet again and then probably make /var/log/hhvm a symlink to /data/project/logs/hhvm to ensure that we have lots of space for cores. This is of course only useful if we have someone watching for hhvm crashes and doing something to triage the bugs that cause them.

Comment 3 Chris McMahon 2014-08-25 16:07:36 UTC

multiple HHVM cores per day seems like a real problem

Comment 4 Bryan Davis 2014-08-25 16:16:32 UTC

(In reply to Chris McMahon from comment #3)
> multiple HHVM cores per day seems like a real problem

Likely some new to us hhvm bug. Unfortunately we'll need to wait for it to happen again if the cores are gone now.

Comment 5 Antoine "hashar" Musso (WMF) 2014-08-27 21:39:58 UTC

I have deleted the 2GB+ core files on mediawiki02:/tmp/

Comment 6 Greg Grossmeier 2014-08-27 22:04:59 UTC

Since Bryan and Giuseppe were working on this issue this morning, I'm assigning to Bryan. :)

Comment 7 Bryan Davis 2014-08-27 22:43:45 UTC

Bandaid solution:

$ cat cleanup-hhvm-cores
#!/usr/bin/env bash

sudo mv /tmp/hhvm.*.core /data/project/hhvm-cores
sudo mv /var/log/hhvm/stacktrace* /data/project/hhvm-cores

$ crontab -l
*/2 * * * * /home/bd808/cleanup-hhvm-cores

Applied on both deployment-mediawiki01 and deployment-mediawiki02

Comment 8 Bryan Davis 2014-08-27 23:07:09 UTC

Unlicking this cookie. The core (ha punny) problem remains but hopefully someone on the hhvm team can start triaging from the cores in /data/project/hhvm-cores

Comment 9 Greg Grossmeier 2014-08-27 23:12:18 UTC

Resseting assignee and priority as this is now no longer an OMG! situation.

For the record:
gjg@deployment-bastion:/data/project/hhvm-cores$ ls -al *core | wc -l
12

(between Aug 26 05:13 and Aug 27 22:34 UTC)

Comment 10 Greg Grossmeier 2014-08-28 17:14:25 UTC

Still at 12 since over night. I wonder if the core dumps were caused by the relatively mega high load due to the automated security audit?

Comment 11 Bryan Davis 2014-08-28 17:25:47 UTC

(In reply to Greg Grossmeier from comment #10)
> Still at 12 since over night. I wonder if the core dumps were caused by the
> relatively mega high load due to the automated security audit?

Or the fuzzing hit legit bugs in our php code that trigger hhvm segfaults.

Comment 12 Antoine "hashar" Musso (WMF) 2014-08-29 21:58:21 UTC

Note I originally created the bug because hhvm send cores to /tmp/ which should be configured in the hhvm conf file to point somewhere else, or become configurable (I think the path is hardcoded in hhvm).

Comment 13 Daniel Zahn 2014-08-29 22:14:56 UTC

(In reply to Antoine "hashar" Musso from comment #12)
> Note I originally created the bug because hhvm send cores to /tmp/ which
> should be configured in the hhvm conf file to point somewhere else, or
> become configurable (I think the path is hardcoded in hhvm).

-->  https://gerrit.wikimedia.org/r/#/c/157294/2

Comment 14 Gerrit Notification Bot 2014-08-30 11:49:05 UTC

Change 157294 had a related patch set uploaded by Hashar:
hhvm - make debug path configurable

https://gerrit.wikimedia.org/r/157294

Comment 15 Gerrit Notification Bot 2014-09-03 01:38:24 UTC

Change 157294 abandoned by Dzahn:
hhvm - make debug path configurable

https://gerrit.wikimedia.org/r/157294

Comment 16 Ori Livneh 2014-09-28 00:39:54 UTC

Maybe we should simply disable core dumps by default (by setting the proper limit / sysctl params)? We certainly could on labs.

Comment 17 Marc A. Pelletier 2014-09-28 02:19:46 UTC

FYI, by default, the linux kernel creates core files in the process's CWD.  If you want to retain the core files just not in /tmp, you can give a file pattern (including path) in /proc/sys/kernel/core_pattern

Comment 18 Ori Livneh 2014-09-28 02:32:42 UTC

(In reply to Marc A. Pelletier from comment #17)
> FYI, by default, the linux kernel creates core files in the process's CWD. 
> If you want to retain the core files just not in /tmp, you can give a file
> pattern (including path) in /proc/sys/kernel/core_pattern

Right. That can easily be done in the pre-start stanza of the upstart job. But I don't agree with Antoine and Bryan that we should, in fact, do this. If it's important to retain core files, then let's keep them in /tmp. If the beta cluster app servers don't have enough space in /tmp, then that's the actual bug, and we should fix it by making sure they do.

Gratuitous and unprincipled divergence from production compromises both the beta cluster and production: the beta cluster because its fidelity to production is its very value and purpose, and production because the Puppet change needed to make the divergence possible means adding a useless knob to the manifests. Sometimes it's unavoidable, but I don't think this is one of those times.

Comment 19 Marc A. Pelletier 2014-09-29 13:32:04 UTC

While I can think of a number of good reasons why you'd want to keep cores in a development environment, would you even /want/ to have core dumps in prod at all in the first place?

Comment 20 Ori Livneh 2014-09-29 13:38:51 UTC

(In reply to Marc A. Pelletier from comment #19)
> While I can think of a number of good reasons why you'd want to keep cores
> in a development environment, would you even /want/ to have core dumps in
> prod at all in the first place?

I think I'd be fine with disabling them. We've had bugs before that we couldn't reproduce in our dev environments but we can re-enable core dumps if such a problem manifests again.

Comment 21 Antoine "hashar" Musso (WMF) 2014-11-11 13:15:09 UTC

hhvm on beta cluster now dumps files to /var/tmp/hhvm which is a 2GB partition.  Noticed on deployment-mediawiki01.eqiad.wmflabs.  I have deleted the core file.

Comment 22 Bryan Davis 2014-11-11 16:38:53 UTC

(In reply to Antoine "hashar" Musso (WMF) from comment #21)
> hhvm on beta cluster now dumps files to /var/tmp/hhvm which is a 2GB
> partition.  Noticed on deployment-mediawiki01.eqiad.wmflabs.  I have deleted
> the core file.

I have updated my core file sweeping script for the new location:

  #!/usr/bin/env bash
  
  sudo mv /tmp/hhvm.*.core /data/project/hhvm-cores &>/dev/null
  sudo mv /var/tmp/hhvm/*.core /data/project/hhvm-cores &>/dev/null
  sudo mv /var/log/hhvm/stacktrace* /data/project/hhvm-cores &>/dev/null

This is ~bd808/cleanup-hhvm-cores on any deployment-prep host and croned as my user on deployment-mediawiki0[12].

Wikimedia Bugzilla is closed!

Search

Personal tools

Navigation

Links