Last modified: 2011-03-12 15:34:45 UTC
I've tested this on two different servers, both running Apache/2.0.58 and PHP 5.1.6. The same MediaWiki version is also running on another server with a blank database and works correctly. When clicking different wiki links very often apache dies (request times out) and the certain process consumes all free CPU ressources until it is killed manually with SIGKILL (SIGTERM doesn't work). It seems to happen especially if I click a link when the page is still loading. Non-wiki links don't lead to this problem. This happens on both servers with two different MW 1.8.2 installations which were upgraded to 1.9svn. I also tried different apache workers (mpm-prefork, mpm-worker), it happens both with threads enabled or without threads, no luck so far. As it only happens on updated systems I guess there is a database problem. To be sure I also wiped out one existing installation and copied the files from the clean, working installation without luck.
Maybe you can trace system calls ? Under linux: # strace -p <pid of the cpu eating apache> That might help in finding the trouble.
interesting - it shows nothing. I tried something new - I installed another MediaWiki instance with a newer revision - I also get the problems there even it is a blank installation, but not that often and - after waiting around one or two minutes apache still lives and finally delivers a page (while using up all idle CPU cycles in between). This is the first part of strace's output - as stated above there was no output as long as apache was "hnaging", but the "--- SIGPROF (Profiling timer expired) @ 0 (0) ---" sound interesting to me. -:[/var/www/localhost/htdocs/u00023/wiki]#> strace -p24333 Process 24333 attached - interrupt to quit --- SIGPROF (Profiling timer expired) @ 0 (0) --- setitimer(ITIMER_PROF, {it_interval={0, 0}, it_value={30, 0}}, NULL) = 0 rt_sigaction(SIGPROF, {0xb7655846, [PROF], SA_RESTART}, {0xb7655846, [PROF], SA_RESTART}, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [PROF], NULL, 8) = 0 stat64("/", {st_mode=S_IFDIR|0755, st_size=432, ...}) = 0 brk(0x8d4d000) = 0x8d4d000 mmap2(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb672e000 munmap(0xb672e000, 135168) = 0 brk(0x8d3d000) = 0x8d3d000 brk(0x8d2d000) = 0x8d2d000 pwrite64(242, "wsUserID|i:3;wsUserName|s:6:\"Man"..., 137, 0) = 137 close(242) = 0
I recompiled PHP without the concurrentmodphp patch. The problem still exists, in fact the process is no really dying and not proceeding after some time. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 19937 apache 22 0 274m 36m 6096 S 97.7 4.1 0:31.00 apache2 -:[/etc/apache2/modules.d]#> strace -p19937 Process 19937 attached - interrupt to quit read(7, That's all.
After sending SIGTERM: -:[/etc/apache2/modules.d]#> strace -p19937 Process 19937 attached - interrupt to quit read(7, 0xbff02437, 1) = ? ERESTARTSYS (To be restarted) --- SIGTERM (Terminated) @ 0 (0) --- sigreturn() = ? (mask now [HUP INT QUIT USR1 ALRM STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH IO PWR]) read(7,
I used the debug log to find out what MediaWiki is doing when apache dies: Start request GET /Netphone/TQM User-Agent: Opera/9.02 (X11; Linux x86_64; U; de) Host: wiki.fhbb-online.de Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1 Accept-Language: de,en;q=0.9 Accept-Charset: iso-8859-1, utf-8, utf-16, *;q=0.1 Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0 Referer: http://wiki.fhbb-online.de/Netphone/TQM If-Modified-Since: Sun, 10 Dec 2006 13:04:30 GMT Cookie: db00023_mw_UserID=3; db00023_mw_UserName=Manuel; db00023_mw_Token=c7cfad10e52f13a96da298abcac55d8d; db00023_mw__session=20e5734361bb46732dd03d2bdf12bfef Cookie2: $Version=1 Connection: Keep-Alive, TE TE: deflate, gzip, chunked, identity, trailers Main cache: FakeMemCachedClient Message cache: MediaWikiBagOStuff Parser cache: MediaWikiBagOStuff Fully initialised Unstubbing $wgContLang on call of $wgContLang->checkTitleEncoding from WebRequest::getGPCVal Unstubbing $wgUser on call of $wgUser->isAllowed from Title::userCanRead Cache miss for user 3 Unstubbing $wgLoadBalancer on call of $wgLoadBalancer->getConnection from wfGetDB Logged in from cookie Unstubbing $wgOut on call of $wgOut->setSquidMaxage from MediaWiki::performAction OutputPage::checkLastModified: -- client send If-Modified-Since: Sun, 10 Dec 2006 13:04:30 GMT OutputPage::checkLastModified: -- we might send Last-Modified : Sun, 10 Dec 2006 13:04:30 GMT OutputPage::sendCacheControl: private caching; Sun, 10 Dec 2006 13:04:30 GMT ** OutputPage::checkLastModified: CACHED client: 20061210130430 ; user: 20061210001119 ; page: 20061210090716 ; site 20061210130430
The next request killed apache again: Start request GET /Netphone User-Agent: Opera/9.02 (X11; Linux x86_64; U; de) Host: wiki.fhbb-online.de Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1 Accept-Language: de,en;q=0.9 Accept-Charset: iso-8859-1, utf-8, utf-16, *;q=0.1 Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0 Referer: http://wiki.fhbb-online.de/Netphone/TQM If-Modified-Since: Sun, 10 Dec 2006 13:04:30 GMT Cookie: db00023_mw_UserID=3; db00023_mw_UserName=Manuel; db00023_mw_Token=c7cfad10e52f13a96da298abcac55d8d; db00023_mw__session=20e5734361bb46732dd03d2bdf12bfef Cookie2: $Version=1 Connection: Keep-Alive, TE TE: deflate, gzip, chunked, identity, trailers Main cache: FakeMemCachedClient Message cache: MediaWikiBagOStuff Parser cache: MediaWikiBagOStuff Fully initialised Unstubbing $wgContLang on call of $wgContLang->checkTitleEncoding from WebRequest::getGPCVal Unstubbing $wgUser on call of $wgUser->isAllowed from Title::userCanRead Cache miss for user 3 Unstubbing $wgLoadBalancer on call of $wgLoadBalancer->getConnection from wfGetDB Logged in from cookie Unstubbing $wgOut on call of $wgOut->setSquidMaxage from MediaWiki::performAction OutputPage::checkLastModified: -- client send If-Modified-Since: Sun, 10 Dec 2006 13:04:30 GMT OutputPage::checkLastModified: -- we might send Last-Modified : Sun, 10 Dec 2006 13:04:30 GMT OutputPage::sendCacheControl: private caching; Sun, 10 Dec 2006 13:04:30 GMT ** OutputPage::checkLastModified: CACHED client: 20061210130430 ; user: 20061210001119 ; page: 20061210090716 ; site 20061210130430
Well, I found out that all requests with "OutputPage::checkLastModified: -- client send If-Modified-Since: Sun, 10 Dec 2006 13:04:30 GMT" let apache die. If I wait a minute before sending another request I get "OutputPage::checkLastModified: client did not send If-Modified-Since header" and it works.
I found the line of code where it happens by adding some additional wfDebug() statements: // Don't output compressed blob while( $status = ob_get_status() ) { ob_end_clean(); } This loop runs forever. Any comments?
ehr, forgot the location: includes/OutputPage.php line 139
geez, as soon as I add a var_dump( $status ); it works... // Don't output compressed blob while( $status = ob_get_status() ) { var_dump( $status ); ob_end_clean(); } That's not a good solution, but it works so far and maybe there is someone who can figure out where the problem is.
this seems to be a bug in php, or at least odd and unexpected behavior. Can you please try replacing the loop with the followign code: while (ob_get_level()) { ob_end_clean(); } $status = ob_get_status(); Comments on http://www.php.net/manual/de/function.ob-get-level.php seem to suggest that this sometimes may also fail, though. They suggest yet another way: while (ob_get_length () !== FALSE) { ob_end_clean(); } $status = ob_get_status(); It seems unclear if "internal" buffers of PHP are counted, and if they can be "popped" using ob_end_clean(). If you want to do some further experiments, try to disable all "internal" output buffereing done by php. The following ini options come to mind: zlib.output_compression, tidy.clean_output, session.use_trans_sid, and output_buffering (see http://www.php.net/manual/de/ini.php for reference).
Ok, so both code snippets don't work for me. Current PHP settings: output_buffering = Off zlib.output_compression = On session.use_trans_sid = 0 After disabling zlib.output_compression it works - all solutions, including the original file as in r18220. But it is stil unclear now if "disabling zlib" should be a requirement of MediaWiki or if there is a workaround.
Should be fixed on trunk with r18253, please confirm.
Yes, it works. Thanks!