Last modified: 2011-03-12 15:34:45 UTC

Wikimedia Bugzilla is closed!

Wikimedia has migrated from Bugzilla to Phabricator. Bug reports should be created and updated in Wikimedia Phabricator instead. Please create an account in Phabricator and add your Bugzilla email address to it.
Wikimedia Bugzilla is read-only. If you try to edit or create any bug report in Bugzilla you will be shown an intentional error message.
In order to access the Phabricator task corresponding to a Bugzilla report, just remove "static-" from its URL.
You could still run searches in Bugzilla or access your list of votes but bug reports will obviously not be up-to-date in Bugzilla.
Bug 8148 - Apache randomly dies using up all CPU cycles after upgrade from 1.8.2 to trunk
Apache randomly dies using up all CPU cycles after upgrade from 1.8.2 to trunk
Status: CLOSED FIXED
Product: MediaWiki
Classification: Unclassified
General/Unknown (Other open bugs)
1.9.x
PC Linux
: High critical (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-12-04 12:27 UTC by Manuel Schneider
Modified: 2011-03-12 15:34 UTC (History)
1 user (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Manuel Schneider 2006-12-04 12:27:04 UTC
I've tested this on two different servers, both running Apache/2.0.58 and PHP 5.1.6.

The same MediaWiki version is also running on another server with a blank database and works correctly.

When clicking different wiki links very often apache dies (request times out) and the certain process consumes all 
free CPU ressources until it is killed manually with SIGKILL (SIGTERM doesn't work). It seems to happen especially 
if I click a link when the page is still loading. Non-wiki links don't lead to this problem.

This happens on both servers with two different MW 1.8.2 installations which were upgraded to 1.9svn.

I also tried different apache workers (mpm-prefork, mpm-worker), it happens both with threads enabled or without 
threads, no luck so far.

As it only happens on updated systems I guess there is a database problem. To be sure I also wiped out one existing 
installation and copied the files from the clean, working installation without luck.
Comment 1 Antoine "hashar" Musso (WMF) 2006-12-07 17:39:23 UTC
Maybe you can trace system calls ? Under linux:

# strace -p <pid of the cpu eating apache>

That might help in finding the trouble.
Comment 2 Manuel Schneider 2006-12-10 00:08:31 UTC
interesting - it shows nothing.

I tried something new - I installed another MediaWiki instance with a newer revision - I also get the problems there even it is a blank installation, but not that often and - 
after waiting around one or two minutes apache still lives and finally delivers a page (while using up all idle CPU cycles in between).

This is the first part of strace's output - as stated above there was no output as long as apache was "hnaging", but the "--- SIGPROF (Profiling timer expired) @ 0 (0) ---" 
sound interesting to me.


-:[/var/www/localhost/htdocs/u00023/wiki]#> strace -p24333
Process 24333 attached - interrupt to quit
--- SIGPROF (Profiling timer expired) @ 0 (0) ---
setitimer(ITIMER_PROF, {it_interval={0, 0}, it_value={30, 0}}, NULL) = 0
rt_sigaction(SIGPROF, {0xb7655846, [PROF], SA_RESTART}, {0xb7655846, [PROF], SA_RESTART}, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [PROF], NULL, 8) = 0
stat64("/", {st_mode=S_IFDIR|0755, st_size=432, ...}) = 0
brk(0x8d4d000)                          = 0x8d4d000
mmap2(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb672e000
munmap(0xb672e000, 135168)              = 0
brk(0x8d3d000)                          = 0x8d3d000
brk(0x8d2d000)                          = 0x8d2d000
pwrite64(242, "wsUserID|i:3;wsUserName|s:6:\"Man"..., 137, 0) = 137
close(242)                              = 
0                                                                                                                                                          
Comment 3 Manuel Schneider 2006-12-10 12:35:03 UTC
I recompiled PHP without the concurrentmodphp patch.

The problem still exists, in fact the process is no really dying and not proceeding after some time.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
19937 apache    22   0  274m  36m 6096 S 97.7  4.1   0:31.00 apache2

-:[/etc/apache2/modules.d]#> strace -p19937
Process 19937 attached - interrupt to quit
read(7,  

That's all.
Comment 4 Manuel Schneider 2006-12-10 12:36:25 UTC
After sending SIGTERM:

-:[/etc/apache2/modules.d]#> strace -p19937
Process 19937 attached - interrupt to quit
read(7, 0xbff02437, 1)                  = ? ERESTARTSYS (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
sigreturn()                             = ? (mask now [HUP INT QUIT USR1 ALRM STKFLT CHLD CONT TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH IO PWR])
read(7,     
Comment 5 Manuel Schneider 2006-12-10 13:06:31 UTC
I used the debug log to find out what MediaWiki is doing when apache dies:

Start request
GET /Netphone/TQM
User-Agent: Opera/9.02 (X11; Linux x86_64; U; de)
Host: wiki.fhbb-online.de
Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
Accept-Language: de,en;q=0.9
Accept-Charset: iso-8859-1, utf-8, utf-16, *;q=0.1
Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0
Referer: http://wiki.fhbb-online.de/Netphone/TQM
If-Modified-Since: Sun, 10 Dec 2006 13:04:30 GMT
Cookie: db00023_mw_UserID=3; db00023_mw_UserName=Manuel; db00023_mw_Token=c7cfad10e52f13a96da298abcac55d8d; db00023_mw__session=20e5734361bb46732dd03d2bdf12bfef
Cookie2: $Version=1
Connection: Keep-Alive, TE
TE: deflate, gzip, chunked, identity, trailers

Main cache: FakeMemCachedClient
Message cache: MediaWikiBagOStuff
Parser cache: MediaWikiBagOStuff
Fully initialised
Unstubbing $wgContLang on call of $wgContLang->checkTitleEncoding from WebRequest::getGPCVal
Unstubbing $wgUser on call of $wgUser->isAllowed from Title::userCanRead
Cache miss for user 3
Unstubbing $wgLoadBalancer on call of $wgLoadBalancer->getConnection from wfGetDB
Logged in from cookie
Unstubbing $wgOut on call of $wgOut->setSquidMaxage from MediaWiki::performAction
OutputPage::checkLastModified: -- client send If-Modified-Since: Sun, 10 Dec 2006 13:04:30 GMT
OutputPage::checkLastModified: --  we might send Last-Modified : Sun, 10 Dec 2006 13:04:30 GMT
OutputPage::sendCacheControl: private caching; Sun, 10 Dec 2006 13:04:30 GMT **
OutputPage::checkLastModified: CACHED client: 20061210130430 ; user: 20061210001119 ; page: 20061210090716 ; site 20061210130430
Comment 6 Manuel Schneider 2006-12-10 13:08:18 UTC
The next request killed apache again:

Start request
GET /Netphone
User-Agent: Opera/9.02 (X11; Linux x86_64; U; de)
Host: wiki.fhbb-online.de
Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1
Accept-Language: de,en;q=0.9
Accept-Charset: iso-8859-1, utf-8, utf-16, *;q=0.1
Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0
Referer: http://wiki.fhbb-online.de/Netphone/TQM
If-Modified-Since: Sun, 10 Dec 2006 13:04:30 GMT
Cookie: db00023_mw_UserID=3; db00023_mw_UserName=Manuel; db00023_mw_Token=c7cfad10e52f13a96da298abcac55d8d; db00023_mw__session=20e5734361bb46732dd03d2bdf12bfef
Cookie2: $Version=1
Connection: Keep-Alive, TE
TE: deflate, gzip, chunked, identity, trailers

Main cache: FakeMemCachedClient
Message cache: MediaWikiBagOStuff
Parser cache: MediaWikiBagOStuff
Fully initialised
Unstubbing $wgContLang on call of $wgContLang->checkTitleEncoding from WebRequest::getGPCVal
Unstubbing $wgUser on call of $wgUser->isAllowed from Title::userCanRead
Cache miss for user 3
Unstubbing $wgLoadBalancer on call of $wgLoadBalancer->getConnection from wfGetDB
Logged in from cookie
Unstubbing $wgOut on call of $wgOut->setSquidMaxage from MediaWiki::performAction
OutputPage::checkLastModified: -- client send If-Modified-Since: Sun, 10 Dec 2006 13:04:30 GMT
OutputPage::checkLastModified: --  we might send Last-Modified : Sun, 10 Dec 2006 13:04:30 GMT
OutputPage::sendCacheControl: private caching; Sun, 10 Dec 2006 13:04:30 GMT **
OutputPage::checkLastModified: CACHED client: 20061210130430 ; user: 20061210001119 ; page: 20061210090716 ; site 20061210130430
Comment 7 Manuel Schneider 2006-12-10 13:24:39 UTC
Well, I found out that all requests with "OutputPage::checkLastModified: -- client send If-Modified-Since: Sun, 10 Dec 2006 13:04:30 GMT" let apache die.

If I wait a minute before sending another request I get "OutputPage::checkLastModified: client did not send If-Modified-Since header" and it works.
Comment 8 Manuel Schneider 2006-12-10 13:32:15 UTC
I found the line of code where it happens by adding some additional wfDebug() statements:

                                // Don't output compressed blob
                                while( $status = ob_get_status() ) {
                                        ob_end_clean();
                                }

This loop runs forever.

Any comments?
Comment 9 Manuel Schneider 2006-12-10 13:32:49 UTC
ehr, forgot the location:

includes/OutputPage.php line 139
Comment 10 Manuel Schneider 2006-12-10 13:50:45 UTC
geez, as soon as I add a var_dump( $status ); it works...

// Don't output compressed blob
while( $status = ob_get_status() ) {
var_dump( $status );
ob_end_clean();
}

That's not a good solution, but it works so far and maybe there is someone who can figure out where the problem is.
Comment 11 Daniel Kinzler 2006-12-10 14:12:10 UTC
this seems to be a bug in php, or at least odd and unexpected behavior. Can you
please try replacing the loop with the followign code:

    while (ob_get_level()) {
        ob_end_clean();
    }
    $status = ob_get_status();

Comments on http://www.php.net/manual/de/function.ob-get-level.php seem to
suggest that this sometimes may also fail, though. They suggest yet another way:

    while (ob_get_length () !== FALSE) {
        ob_end_clean();
    }
    $status = ob_get_status();

It seems unclear if "internal" buffers of PHP are counted, and if they can be
"popped" using ob_end_clean(). If you want to do some further experiments, try
to disable all "internal" output buffereing done by php. The following ini
options come to mind: zlib.output_compression, tidy.clean_output,
session.use_trans_sid, and output_buffering (see
http://www.php.net/manual/de/ini.php for reference).
Comment 12 Manuel Schneider 2006-12-10 14:28:42 UTC
Ok, so both code snippets don't work for me.

Current PHP settings:
output_buffering = Off
zlib.output_compression = On
session.use_trans_sid = 0

After disabling zlib.output_compression it works - all solutions, including the original file as in r18220.

But it is stil unclear now if "disabling zlib" should be a requirement of MediaWiki or if there is a workaround.
Comment 13 Brion Vibber 2006-12-11 01:52:23 UTC
Should be fixed on trunk with r18253, please confirm.
Comment 14 Manuel Schneider 2006-12-11 08:06:21 UTC
Yes, it works. Thanks!

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links