Last modified: 2014-05-30 15:42:22 UTC
While attempting to confirm bug 65251, a user performed rapid file uploads in an attempt to get two versions uploaded within the same second. In dberror.log, I see numerous entries along the lines of the following that seem to be related: Tue May 13 15:25:30 UTC 2014 mw1190 zhwiki Connection lost and reconnected after 33.725s, query: BEGIN /* DatabaseBase::query (Revision::fetchFromConds) Liangent-bot */ Which seems to be behind these exceptions that Ori noticed: 2014-05-13 15:51:25 mw1147 zhwiki: [2328423f] /w/api.php?maxlag=0 Exception from line 927 of /usr/local/apache/common-local/php-1.24wmf3/includes/db/Database.php: Transaction idle callbacks still pending. #0 /usr/local/apache/common-local/php-1.24wmf3/includes/db/DatabaseMysqlBase.php(66): DatabaseBase->close() #1 /usr/local/apache/common-local/php-1.24wmf3/includes/db/DatabaseMysqlBase.php(574): DatabaseMysqlBase->open('10.64.16.13', [REDACTED], [REDACTED], 'zhwiki') #2 /usr/local/apache/common-local/php-1.24wmf3/includes/db/Database.php(1108): DatabaseMysqlBase->ping() #3 /usr/local/apache/common-local/php-1.24wmf3/includes/db/Database.php(1595): DatabaseBase->query('SELECT 1 FROM...', 'LocalFile::lock') #4 /usr/local/apache/common-local/php-1.24wmf3/includes/db/Database.php(1306): DatabaseBase->select('image', '1', Array, 'LocalFile::lock', Array) #5 /usr/local/apache/common-local/php-1.24wmf3/includes/filerepo/file/LocalFile.php(1851): DatabaseBase->selectField('image', '1', Array, 'LocalFile::lock', Array) #6 /usr/local/apache/common-local/php-1.24wmf3/includes/filerepo/file/LocalFile.php(1168): LocalFile->lock() #7 /usr/local/apache/common-local/php-1.24wmf3/includes/upload/UploadBase.php(698): LocalFile->upload('/tmp/php5wruKn', 'UTFComment: oZV...', 'UTFComment: oZV...', 1, Array, false, Object(User)) #8 /usr/local/apache/common-local/php-1.24wmf3/includes/api/ApiUpload.php(637): UploadBase->performUpload('UTFComment: oZV...', 'UTFComment: oZV...', false, Object(User)) #9 /usr/local/apache/common-local/php-1.24wmf3/includes/api/ApiUpload.php(144): ApiUpload->performUpload(Array) #10 /usr/local/apache/common-local/php-1.24wmf3/includes/api/ApiUpload.php(111): ApiUpload->getContextResult() #11 /usr/local/apache/common-local/php-1.24wmf3/includes/api/ApiMain.php(913): ApiUpload->execute() #12 /usr/local/apache/common-local/php-1.24wmf3/includes/api/ApiMain.php(363): ApiMain->executeAction() #13 /usr/local/apache/common-local/php-1.24wmf3/includes/api/ApiMain.php(334): ApiMain->executeActionWithErrorHandling() #14 /usr/local/apache/common-local/php-1.24wmf3/api.php(85): ApiMain->execute() #15 /usr/local/apache/common-local/w/api.php(3): require('/usr/local/apac...') #16 {main} DatabaseBase->query() appears to have noticed the disconnection, and calls ->ping() to try to reconnect, which tries to reopen the connection, which tries to close the dead connection, which throws an exception due to the pending callbacks. What the correct behavior might be in this situation (if throwing an exception isn't itself right), I have no idea. I note this seems to be an instance of the problem bug 47375 is generically asking to be fixed.
On further review, I'm not sure whether the rapid file uploads are responsible for the database overload leading to the disconnection (although the reconnect-then-throw-exception behavior is still unpleasant). At the same time I see many exceptions like the following: 2014-05-13 16:16:49 mw1029 enwiki: [955fda5b] /w/index.php?title=Special:Export&action=submit Exception from line 1093 of /usr/local/apache/common-local/php-1.24wmf3/includes/db/Database.php: DB connection was already closed. #0 /usr/local/apache/common-local/php-1.24wmf3/includes/db/Database.php(3492): DatabaseBase->query('COMMIT', 'DatabaseBase::c...') #1 /usr/local/apache/common-local/php-1.24wmf3/includes/db/Database.php(3477): DatabaseBase->doCommit('DatabaseBase::c...') #2 /usr/local/apache/common-local/php-1.24wmf3/includes/db/Database.php(937): DatabaseBase->commit('DatabaseBase::c...', 'flush') #3 /usr/local/apache/common-local/php-1.24wmf3/includes/db/LoadBalancer.php(870): DatabaseBase->close() #4 /usr/local/apache/common-local/php-1.24wmf3/includes/specials/SpecialExport.php(395): LoadBalancer->closeAll() #5 /usr/local/apache/common-local/php-1.24wmf3/includes/specials/SpecialExport.php(179): SpecialExport->doExport('New York City??...', Array, false, false) #6 /usr/local/apache/common-local/php-1.24wmf3/includes/specialpage/SpecialPage.php(379): SpecialExport->execute(NULL) #7 /usr/local/apache/common-local/php-1.24wmf3/includes/specialpage/SpecialPageFactory.php(503): SpecialPage->run(NULL) #8 /usr/local/apache/common-local/php-1.24wmf3/includes/Wiki.php(285): SpecialPageFactory::executePath(Object(Title), Object(RequestContext)) #9 /usr/local/apache/common-local/php-1.24wmf3/includes/Wiki.php(596): MediaWiki->performRequest() #10 /usr/local/apache/common-local/php-1.24wmf3/includes/Wiki.php(448): MediaWiki->main() #11 /usr/local/apache/common-local/php-1.24wmf3/index.php(46): MediaWiki->run() #12 /usr/local/apache/common-local/w/index.php(3): require('/usr/local/apac...') #13 {main} combined with many "too many connections" errors in dberror.log; apparently someone was spamming Special:Export at the same time.
Note that enwiki (where Special:Export took place) lives in s1 while zhwiki (where I uploaded files) lives in s2.
Change 133173 had a related patch set uploaded by Aaron Schulz: Cleaned up database reconnection logic https://gerrit.wikimedia.org/r/133173
Change 133173 merged by jenkins-bot: Cleaned up database reconnection logic https://gerrit.wikimedia.org/r/133173
Gerrit change #133173 merged. I assume it has been fixed.