Last modified: 2013-06-18 14:37:05 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T15921, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 13921 - deadlocks mass-deleting media files in categories
deadlocks mass-deleting media files in categories
Status: RESOLVED FIXED
Product: MediaWiki
Classification: Unclassified
Database (Other open bugs)
1.22.0
All All
: High normal with 1 vote (vote)
: ---
Assigned To: Nobody - You can work on this!
: platformeng
: 46086 (view as bug list)
Depends on:
Blocks: 28499 28599
  Show dependency treegraph
 
Reported: 2008-05-01 21:45 UTC by Siebrand Mazeland
Modified: 2013-06-18 14:37 UTC (History)
13 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments
Exception rate time series (21.41 KB, image/png)
2013-04-15 01:32 UTC, Tim Starling
Details

Description Siebrand Mazeland 2008-05-01 21:45:08 UTC
When mass-deleting media files that are in a category, quite often deadlocks are seen originating from "Article::updateCategoryCounts".

Mass deletion is a relative term here: having opened 7 delete tabs in the browser for Wikimedia Commons and deleting them as quickly as possible (say within 2-3 seconds), I may get 3 deadlock out of 7. Would be a great help for maintaince if this was somehow resolved.
Comment 1 Siebrand Mazeland 2008-05-18 14:02:48 UTC
Issue still there. I also ran into another deadlock sotuation: removing 4 older versions of a file quickly after eachother: “LocalFileDeleteBatch::doDBDeletes” gives: “1213: Deadlock found when trying to get lock; Try restarting transaction (10.0.0.231)”.
Comment 2 Aaron Schulz 2008-09-16 12:09:56 UTC
Perhaps this could use the jobqueue to keep a nice linear execution order.
Comment 3 Aaron Schulz 2008-09-16 12:21:08 UTC
(In reply to comment #2)
> Perhaps this could use the jobqueue to keep a nice linear execution order.
> 

Looking at what it does, that shouldn't be n..
Comment 4 Aryeh Gregor (not reading bugmail, please e-mail directly) 2009-02-16 18:03:47 UTC
Ugh, I hate databases.  :(
Comment 5 Chad H. 2011-03-22 01:58:16 UTC
Is still still reproducible?
Comment 6 Siebrand Mazeland 2011-03-22 06:58:32 UTC
(In reply to comment #5)
> Is still still reproducible?

I don't do any mass deletions anymore at Commons, but let me quickly check if there are enough speedy deletion tags at the moment to give it a try.
Comment 7 Siebrand Mazeland 2011-03-22 07:02:15 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > Is still still reproducible?
> 
> I don't do any mass deletions anymore at Commons, but let me quickly check if
> there are enough speedy deletion tags at the moment to give it a try.

Cannot reproduce.
Comment 8 Sam Reed (reedy) 2011-04-11 13:08:10 UTC
r85785
Comment 9 Sam Reed (reedy) 2011-04-11 18:46:45 UTC
Reverted in r85814
Comment 10 Siebrand Mazeland 2011-05-17 15:09:56 UTC
(In reply to comment #5)
> Is still still reproducible?

As said, I haven't tried with images, but I just reproduced it mass deleting pages in the MediaWiki namespace on nl.wikimedia.org (1.17wmf1 (r88299)). The error thrown came from  RecentChange::save and was:

1213: Deadlock found when trying to get lock; try restarting transaction (10.0.6.49)”.
Comment 11 Carl Fürstenberg 2011-05-24 22:19:37 UTC
When updating twinkle to api now, I'm getting database query errors a lot (10 in a minute depending on the amount being processed) which seems to be related to this:

<?xml version="1.0"?><api servedby="srv255"><error code="internal_api_error_DBQueryError" info="Database query error" xml:space="preserve">

#0 /usr/local/apache/common-local/php-1.17/includes/db/Database.php(734): DatabaseBase->reportQueryError('Deadlock found ...', 1213, 'UPDATE  `page` ...', 'Title::invalida...', false)
#1 /usr/local/apache/common-local/php-1.17/includes/db/Database.php(1349): DatabaseBase->query('UPDATE  `page` ...', 'Title::invalida...')
#2 /usr/local/apache/common-local/php-1.17/includes/Title.php(2485): DatabaseBase->update('page', Array, Array, 'Title::invalida...')
#3 /usr/local/apache/common-local/php-1.17/includes/Article.php(4093): Title->invalidateCache()
#4 /usr/local/apache/common-local/php-1.17/includes/Article.php(3271): Article::onArticleDelete(Object(Title))
#5 /usr/local/apache/common-local/php-1.17/includes/api/ApiDelete.php(154): Article->doDeleteArticle('Deleted talk pa...')
#6 /usr/local/apache/common-local/php-1.17/includes/api/ApiDelete.php(79): ApiDelete::delete(Object(Article), 'a0b48ef8649b3b5...', 'Deleted talk pa...')
#7 /usr/local/apache/common-local/php-1.17/includes/api/ApiMain.php(657): ApiDelete->execute()
#8 /usr/local/apache/common-local/php-1.17/includes/api/ApiMain.php(339): ApiMain->executeAction()
#9 /usr/local/apache/common-local/php-1.17/includes/api/ApiMain.php(323): ApiMain->executeActionWithErrorHandling()
#10 /usr/local/apache/common-local/php-1.17/api.php(115): ApiMain->execute()
#11 /usr/local/apache/common-local/live-1.5/api.php(3): require('/usr/local/apac...')
#12 {main}

</error></api>
Comment 12 Siebrand Mazeland 2012-01-06 11:44:30 UTC
Just now I came across this again. Slightly different scenario -- mass deleting empty categories on Wikimedia Commons:

“RecentChange::save”: “1213: Deadlock found when trying to get lock; try restarting transaction (10.0.6.32)”.
Comment 13 Siebrand Mazeland 2012-01-06 11:52:45 UTC
Adding to comment 12: Two more methods the deadlock was reported in:

* WikiPage::updateCategoryCounts
* HTMLCacheUpdate::invalidateTitles
Comment 14 Aaron Schulz 2013-04-11 20:56:49 UTC
(In reply to comment #13)
> Adding to comment 12: Two more methods the deadlock was reported in:
> 
> * WikiPage::updateCategoryCounts
> * HTMLCacheUpdate::invalidateTitles

These are the only two I'm still seeing in the logs a lot.
Comment 15 MZMcBride 2013-04-14 17:40:27 UTC
There's a report of updateCategoryCounts erroring here:

https://en.wikipedia.org/w/index.php?title=User_talk:MZMcBride&oldid=550337699#Bug_46086
Comment 16 Tim Starling 2013-04-15 01:16:41 UTC
*** Bug 46086 has been marked as a duplicate of this bug. ***
Comment 17 Tim Starling 2013-04-15 01:29:08 UTC
Log analysis and frequent reports on bug 46086 indicate that this has been happening more often since 1.21wmf11 was deployed. So, increasing priority.
Comment 18 Tim Starling 2013-04-15 01:32:30 UTC
Created attachment 12105 [details]
Exception rate time series

Rate of exceptions with WikiPage->doDeleteUpdates in the backtrace. Clearly something new is going on, possibly correlated with deployment of MW 1.21wmf11 to commonswiki.
Comment 19 Tim Starling 2013-04-15 03:05:19 UTC
The transaction length is certainly epic. Here's an indicative debug log trace from my local test wiki, just showing the transaction in question, annotated with hook calls:

http://paste.tstarling.com/p/spMggU.html

35 DB queries, 59 hook calls, 4 squid purges and some FileBackend operations. It's not hard to imagine something in there being slow.
Comment 20 Tim Starling 2013-04-15 04:05:13 UTC
What makes this tricky is the use of the $commit=false parameter to doDeleteArticleReal() by FileDeleteForm to attempt to roll back everything when a FileBackend operation fails. Of course, you can't unpurge Squid or unsend an IRC log line, but at least it does make an effort. 

Moving things like squid purges and links updates to a DeferredUpdates job would be easy if it wasn't for this. You can't unqueue a DeferredUpdates job at present. Maybe you could clear the queue or somehow identify invalid jobs, but it would be pretty messy and heuristic. The use of onTransactionIdle() would suffer similar problems.

Ideally, WikiPage::doDeleteArticleReal() would be split up into a pre-commit function and a post-commit function, and FileDeleteForm would call the post-commit function after the file operations are successful.

WikiPage::doDeleteArticleReal() itself would continue to do both halves, for backwards compatibility, which would be fairly convenient in the short term since ApiDelete and maintenance/deleteBatch.php could continue to call it. But in the long term, those two callers would be best served by a new entry point that treated file and non-file deletions equivalently.

Assigning to Rob Lanphier for delegation.
Comment 21 Rob Lanphier 2013-04-15 16:46:10 UTC
Giving this to AaronSchulz to look at.
Comment 22 MZMcBride 2013-04-16 02:45:51 UTC
Related: <https://gerrit.wikimedia.org/r/58820>. Tim thinks this should help with about 80% of the problem.
Comment 23 Andre Klapper 2013-05-24 10:45:25 UTC
Ariel: You silently reopened this - could you add a comment, e.g. how often / where this still can be seen? Thanks.
Comment 24 Ariel T. Glenn 2013-05-24 11:38:25 UTC
oops, I didn't mean to re-open. just to add myself. sorry! do what you need to do...
Comment 25 Andre Klapper 2013-05-24 14:10:41 UTC
Alright :)

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links