Last modified: 2014-05-06 15:40:19 UTC
Change https://gerrit.wikimedia.org/r/#/c/43775/ made against mediawiki/core.git on branch 1.21wmf7, cause our PHPUnit tests to segfault (exit code 139). Under the misc tests https://integration.mediawiki.org/ci/job/mediawiki-core-phpunit-misc/1244/console : phpunit-misc: [echo] Builddir: /var/lib/jenkins/jobs/mediawiki-core-phpunit-misc/workspace [echo] Logdir..: /var/lib/jenkins/jobs/mediawiki-core-phpunit-misc/workspace/logs/ [echo] Indir...: /var/lib/jenkins/jobs/mediawiki-core-phpunit-misc/workspace/tests/phpunit [echo] Opts....: --group Database --exclude-group API,Dump,Parser,Broken,ParserFuzz,Stub -- [exec] PHPUnit 3.7.10 by Sebastian Bergmann. [exec] [exec] Configuration read from /var/lib/jenkins/jobs/mediawiki-core-phpunit-misc/workspace/tests/phpunit/suite.xml [exec] [exec] ......................................... [exec] .................... 61 / 5298 ( 1%) BUILD FAILED /var/lib/jenkins/jobs/_shared/build.xml:452: The following error occurred while executing this line: /var/lib/jenkins/jobs/_shared/build.xml:473: exec returned: 139 Tim ran the test under gdb and it showed a segfault in preg_match_all() in PHPUnit_Util_Test::getRequirements(), when running self::REGEX_REQUIRES. Since we don't seem to use @requires, I just replaced getRequirements() with "return array()", and then my changeset passed all tests. Here's the full backtrace: Program received signal SIGSEGV, Segmentation fault. zval_mark_grey (pz=0xa7f82a0) at /root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c:368 368 /root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c: No such file or directory. (gdb) bt #0 zval_mark_grey (pz=0xa7f82a0) at /root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c:368 #1 0x00000000006b73ac in zval_mark_grey (pz=<optimized out>) at /root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c:379 #2 0x00000000006b7e75 in gc_mark_roots () at /root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c:435 #3 gc_collect_cycles () at /root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c:664 #4 0x00000000006b8174 in gc_zval_possible_root (zv=<optimized out>) at /root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c:166 #5 0x00000000006a7e30 in zend_hash_destroy (ht=0xa7f80f0) at /root/wikimedia/php5/php5-5.3.10/Zend/zend_hash.c:729 #6 0x00000000006994df in _zval_dtor_func (zvalue=0xa7e7598) at /root/wikimedia/php5/php5-5.3.10/Zend/zend_variables.c:46 #7 0x0000000000473c08 in _zval_dtor (zvalue=0xa7e7598) at /root/wikimedia/php5/php5-5.3.10/Zend/zend_variables.h:35 #8 php_pcre_match_impl (pce=0x8fbfcb0, subject=0xa7aba48 "/**\n * These tests should work regardless of $wgCapitalLinks\n * @group Database\n */\n/**\n\t * Make sure MediaWikiTestCase extending classes have called their\n\t * parent setUp method\n\t */", subject_len=184, return_value=0xa7ec5e0, subpats=0xa7e7598, global=1, use_flags=0, flags=0, start_offset=0) at /root/wikimedia/php5/php5-5.3.10/ext/pcre/php_pcre.c:549 #9 0x0000000000473e6b in php_do_pcre_match (ht=3, return_value=0xa7ec5e0, global=1, return_value_ptr=<optimized out>, this_ptr=<optimized out>, return_value_used=<optimized out>) at /root/wikimedia/php5/php5-5.3.10/ext/pcre/php_pcre.c:519 #10 0x000000000070f80d in zend_do_fcall_common_helper_SPEC (execute_data=0x7ffff7ee1f00) at /root/wikimedia/php5/php5-5.3.10/Zend/zend_vm_execute.h:320 #11 0x00000000006c037b in execute (op_array=0x1d5f6c0) at /root/wikimedia/php5/php5-5.3.10/Zend/zend_vm_execute.h:107 #12 0x000000000068d8bc in zend_call_function (fci=0x7fffffffba60, fci_cache=<optimized out>) at /root/wikimedia/php5/php5-5.3.10/Zend/zend_execute_API.c:969 #13 0x00000000005d0178 in zif_call_user_func_array (ht=<optimized out>, return_value=0xa722870, return_value_ptr=<optimized out>, this_ptr=<optimized out>, return_value_used=<optimized out>) at /root/wikimedia/php5/php5-5.3.10/ext/standard/basic_functions.c:4803 #14 0x000000000070f80d in zend_do_fcall_common_helper_SPEC (execute_data=0x7ffff7edf5c0) at /root/wikimedia/php5/php5-5.3.10/Zend/zend_vm_execute.h:320 #15 0x00000000006c037b in execute (op_array=0x901c008) at /root/wikimedia/php5/php5-5.3.10/Zend/zend_vm_execute.h:107 #16 0x000000000069b8e0 in zend_execute_scripts (type=8, retval=0x0, file_count=3) at /root/wikimedia/php5/php5-5.3.10/Zend/zend.c:1308 #17 0x0000000000647f53 in php_execute_script (primary_file=0x7fffffffe1d0) at /root/wikimedia/php5/php5-5.3.10/main/main.c:2323 #18 0x000000000042c797 in main (argc=10, argv=0x7fffffffe3d8) at /root/wikimedia/php5/php5-5.3.10/sapi/cli/php_cli.c:1188
Created attachment 11627 [details] backtrace by Tim of a unit test segfault Above backtrace attached in a text file for convenience.
We probably want to further isolate the unit test that cause that issue and report them upstream (PHP and PHPUnit) and make it easier to reproduce. If it backtrace, we might want to try out another PHP version / some nightly.
I think recompiling PHP with the bundled PCRE source rather than the system library would be the first thing to try. Faidon may be able to help with that. If that doesn't fix it, then you could try PHP 5.3.x git head. It's probably best to try a different PHP version before you isolate and report the issue, since the folks at bugs.php.net are unlikely to be interested in a segfault in a package they don't maintain. If the bug isn't present in the latest 5.3.x, then it will probably be our responsibility to fix or work around it.
High priority since this is blocking merge in wmf branches and several people complained about it since yesterday.
I have removed the hack in PHPUnit and upgraded it to 3.7.13. Running out of a local copy works for me as well as using the workspace of change 44039 which did segfault :/
ah I manage to reproduce the segfault from time to time using Gerrit change #44221 patchset 1. Command used: WORKSPACE=/home/hashar/core JOB_NAME=testing_segfault_job_name ant -file /var/lib/jenkins/jobs/_shared/build.xml phpunit-databaseless
SELF NOTE: On gallium I did: # My private clone of mediawiki cd ~/core/tests/phpunit # Apply change 44221 patchset 1: git fetch https://gerrit.wikimedia.org/r/mediawiki/core refs/changes/21/44221/1 && git checkout -b 44221/1 FETCH_HEAD # change to the subdir, apparently running from the root directory # of the working copy does not trigger the segfault (or i havent tried enough) cd tests/phpunit # run gdb: gdb --args php phpunit.php --conf /home/hashar/core/LocalSettings.php --exclude-group Database,Broken,ParserFuzz,Stub --log-junit /home/hashar/core/logs/junit.xml --; echo $? (gdb) run # wait for segfault Program received signal SIGSEGV, Segmentation fault. zval_mark_grey (pz=0xa196d18) at /root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c:368 368 /root/wikimedia/php5/php5-5.3.10/Zend/zend_gc.c: No such file or directory. # Ask for a backtrace: (gdb) bt # snip backtrace, which is above already.
The line #2 of the backtrace reference gc_mark_roots, googling for PHP segfault gc_mark_roots gives out https://bugs.php.net/bug.php?id=63055 which has the same backtrace when running test suite for Drupal and or Symfony2. Laruence __ php.net says: any usage of zval_dtor with recursive array may trigger this segfault. We indeed see a call to _zval_dtor in our backtrace (line #7).
Created attachment 11638 [details] backtrace with Zend functions shown Using the .gdbinit from PHP, I found out what Tim found ages ago, aka that is caused by a preg_match_all() (gdb) source /home/hashar/gdbinit (gdb) zbacktrace 0x7ffff7ee37c8] preg_match_all("/@requires\s+(?P<name>function|extension)\s(?P<value>([^\40]+))\r?$/m", "\12/**\12\11\40*\40@dataProvider\40provideWfMatchesDomainList\12\11\40*/", array(7)[0xa196c78]) /usr/share/php/PHPUnit/Util/Test.php:125 [0x7ffff7ee32d8] PHPUnit_Util_Test::getRequirements("GlobalTest", "testWfMatchesDomainList") /usr/share/php/PHPUnit/Framework/TestCase.php:557 [0x7ffff7ee2a00] PHPUnit_Framework_TestCase->setRequirementsFromAnnotation() /usr/share/php/PHPUnit/Framework/TestCase.php:585 [0x7ffff7ee12c0] PHPUnit_Framework_TestCase->checkRequirements() /usr/share/php/PHPUnit/Framework/TestCase.php:822 [0x7fffffffbab0] PHPUnit_Framework_TestCase->runBare() [0x7ffff7ee0e88] call_user_func_array(array(2)[0x9ad9a28], array(0)[0xa195ff8]) /usr/share/php/PHP/Invoker.php:93 [0x7ffff7edf4c0] PHP_Invoker->invoke(array(2)[0x9ad9a28], array(0)[0xa195ae0], 2) /usr/share/php/PHPUnit/Framework/TestResult.php:646 [0x7ffff7ede140] PHPUnit_Framework_TestResult->run(object[0x2334d50]) /usr/share/php/PHPUnit/Framework/TestCase.php:769 [0x7ffff7edd438] PHPUnit_Framework_TestCase->run(object[0x9225990]) /home/hashar/core/tests/phpunit/MediaWikiTestCase.php:116 [0x7ffff7edd320] MediaWikiTestCase->run(object[0x9225990]) /usr/share/php/PHPUnit/Framework/TestSuite.php:775 [0x7ffff7edbb10] PHPUnit_Framework_TestSuite->runTest(object[0x2334d50], object[0x9225990]) /usr/share/php/PHPUnit/Framework/TestSuite.php:745 [0x7ffff7eda2e8] PHPUnit_Framework_TestSuite->run(object[0x9225990], false, array(0)[0x9225d70], array(4)[0x9225d20], false) /usr/share/php/PHPUnit/Framework/TestSuite.php:705 [0x7ffff7ed8ac0] PHPUnit_Framework_TestSuite->run(object[0x9225990], false, array(0)[0x9225d70], array(4)[0x9225d20], false) /usr/share/php/PHPUnit/Framework/TestSuite.php:705 [0x7ffff7ed7298] PHPUnit_Framework_TestSuite->run(object[0x9225990], false, array(0)[0x9533340], array(4)[0x95334f0], false) /usr/share/php/PHPUnit/Framework/TestSuite.php:705 [0x7ffff7ed45b0] PHPUnit_Framework_TestSuite->run(object[0x9225990], false, array(0)[0x95343a0], array(4)[0x9534550], false) /usr/share/php/PHPUnit/TextUI/TestRunner.php:346 [0x7ffff7ed39f8] PHPUnit_TextUI_TestRunner->doRun(object[0x1b4adf8], array(7)[0x9535338]) /usr/share/php/PHPUnit/TextUI/Command.php:176 [0x7ffff7ed3800] PHPUnit_TextUI_Command->run(array(10)[0x3638e90], false) /home/hashar/core/tests/phpunit/MediaWikiPHPUnitCommand.php:61 [0x7ffff7ed34b0] MediaWikiPHPUnitCommand->run(array(10)[0x3639e68], true) /home/hashar/core/tests/phpunit/MediaWikiPHPUnitCommand.php:47 [0x7ffff7ed3068] MediaWikiPHPUnitCommand::main() /home/hashar/core/tests/phpunit/phpunit.php:107
Upstream bug PHP #63055 https://bugs.php.net/bug.php?id=63055
Tim proposed to use a different PHP version and or PECL version. According to upstream bug 63055, the bug is in PHP-5.4.x as well so I have reinstated Tim live hack to PHPUnit: vim /usr/share/php/PHPUnit/Util/Test.php public static function getRequirements($className, $methodName) { // HASHAR TIM hack bug https://bugzilla.wikimedia.org/43972 return array(); ... } That is a workaround for the bug.
*** Bug 43390 has been marked as a duplicate of this bug. ***
Lowering priority since we have applied a workaround
*** Bug 44306 has been marked as a duplicate of this bug. ***
Upstream bug apparently got solved http://git.php.net/?p=php-src.git;a=commit;h=ccc519b7a92bfe4b191c0e2e3869516171247ac2 That commit is in: $ git branch -r --contains ccc519b7a92bfe4b191c0e2e3869516171247ac2 origin/HEAD -> origin/master origin/PHP-5.4 origin/PHP-5.4.10 origin/PHP-5.4.11 origin/PHP-5.4.9 origin/PHP-5.5 origin/immutable-date origin/master So I guess PHP >= 5.4.9 is fine :-)
and PHP >= 5.3.19
Moving bug back in poll. This will be fixed whenever we upgrade to PHP 5.3.19+
Got another occurrence when running the full test suite: https://integration.wikimedia.org/ci/job/mediawiki-core-master-phpunit-all/1454/consoleFull /var/lib/jenkins/jobs/_shared/build.xml:437: The following error occurred while executing this line: /var/lib/jenkins/jobs/_shared/build.xml:482: exec returned: 139
I can confirm the workaround described in Comment #11 is still present. So we must have yet another segfault issue :(
*** Bug 47069 has been marked as a duplicate of this bug. ***
Pinged ops-l list about it. Seems to me we want to cherry-pick the upstream change in our PHP package.
RT https://rt.wikimedia.org/Ticket/Display.html?id=5209
No activity on RT, I have pinged it.
Alexandros provided some new packages. I have manually installed them on gallium: dpkg -i \libapache2-mod-php5_5.3.10-1ubuntu3.7+wmf1_amd64.deb \php5-cli_5.3.10-1ubuntu3.7+wmf1_amd64.deb \php5-common_5.3.10-1ubuntu3.7+wmf1_amd64.deb \php5-curl_5.3.10-1ubuntu3.7+wmf1_amd64.deb \ php5-dbg_5.3.10-1ubuntu3.7+wmf1_amd64.deb \ php5-dev_5.3.10-1ubuntu3.7+wmf1_amd64.deb \ php5-gd_5.3.10-1ubuntu3.7+wmf1_amd64.deb \ php5-intl_5.3.10-1ubuntu3.7+wmf1_amd64.deb \ php5-mysql_5.3.10-1ubuntu3.7+wmf1_amd64.deb \ php5-pgsql_5.3.10-1ubuntu3.7+wmf1_amd64.deb \ php5-sqlite_5.3.10-1ubuntu3.7+wmf1_amd64.deb \ php5-tidy_5.3.10-1ubuntu3.7+wmf1_amd64.deb
I have retriggered the code coverage job which was segfaulting: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/
Created attachment 13224 [details] backtrace with PHP 5.3.10-1ubuntu3.7+wmf1 provided by Alexandros
Created attachment 13225 [details] 2nd backtrace with PHP 5.3.10-1ubuntu3.7+wmf1 Another backtrace. zbacktrace has now clue, phpbt yields: No symbol "execute_data" in current context.
PHPUnit 3.7.22 includes a workaround for https://bugs.php.net/bug.php?id=63055
phpunit 3.7.24 has been deployed last week on gallium. I am upgrading the PHP packages to keep them in sync with production. That get rid of Alexandros PHP patches but since PHPUnit has a workaround, that should be fine. Retriggering the coverage job at https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/
Change 83940 had a related patch set uploaded by Hashar: disable suoshin mem handler for code coverage https://gerrit.wikimedia.org/r/83940
Created attachment 13273 [details] 3rd backtrace with suhosin canary mm disabled After running the job with SUHOSIN_MM_USE_CANARY_PROTECTION=0 disabling suhosin's mm there was a different bt. Attaching it here.
PHP still segfaults but it happens very late in PHP execution (during shutdown), so the HTML is actually generated and published at https://integration.wikimedia.org/cover/mediawiki-core/master/php/
Change 83940 abandoned by Hashar: disable suoshin mem handler for code coverage Reason: does not prevent PHP from segfaulting .. https://gerrit.wikimedia.org/r/83940
Just want to note that Wikibase also has troubles with phpunit on PHP 5.3.27 (on travis-ci). Backtrace: http://pastebin.com/Me7zsvmk
Created attachment 14213 [details] backtrace of Wikibase tests on travis Attaching to bug the backtrace pasted at http://pastebin.com/Me7zsvmk
Change 116093 had a related patch set uploaded by Hashar: Coverage now ignore phpunit ignores https://gerrit.wikimedia.org/r/116093
Change 116093 merged by jenkins-bot: Coverage now ignore phpunit ignores https://gerrit.wikimedia.org/r/116093
All patches merged; resetting ticket status
There is no more segfaults happening.