Last modified: 2012-09-19 18:19:33 UTC
On Commons, nowadays we sometimes notice that after a file deletion via API, the file-revision-table remains visible and links to the files are not rendered as redlinks. Purging helps but if you delete, you do not expect having to go to the page and purge it. I will try to get some screenshots.
Not just the API. For the past hour I've been working through a backlog of copyright violations on Commons based on this category: https://commons.wikimedia.org/wiki/Category:Copyright_violations All 3 files I deleted in the last 5 minutes however are different, something weird happened (or rather, didn't happen). The category member links were kept and after refreshing they're still showing up in the category (although the thumbnail is broken, and opening the link shows a "This page was recently deleted" message). [[commons:File:Re7.png]], [[commons:File:Re5.png]], [[commons:File:Re3.png]]
Looks like after purging the category page and file pages, they were removed from the category. It shouldn't be possible for a non-existent page to be categorized :D
The POST request submitting action=delete sometimes takes about to 2-3 seconds to respond.
For me it sometimes takes up to 10s but I am on the "wrong server" (secure.wikimedia.org)
Until around the middle of February, I could use DelReqHandler to delete all of the files in a Mass DR in about ten seconds -- hit the {del} link on each file, then hit Enter to accept the default comment, and so on. They would all go red very quickly. I am currently working on http://commons.wikimedia.org/wiki/Commons:Deletion_requests/Files_in_Category:Commemorative_euro_coins_%28France%29#Files_in_Category:Commemorative_euro_coins_.28France.29 After two minutes, the first five files are still highlighted in gray with "working". The last six are all red. If recent experience is true here, they will remain "working" indefinitely -- I once left them overnight without change. If I then reload the DR page, several things can happen. 1) In this case, they all show up red. This is true less than half the time 2) In most cases some or all will will remain blue. 2a) Then if I go to the file page, in some cases the file will be intact. 2b) In other cases, the file will appear intact, but the pulldown will start with "Undelete", showing that the file is actually gone. In that case, using "Purge" on the pulldown will turn it red on the DR page. 2c) in still other cases, the file page will turn up with "That file has been deleted..." Since there are more than 100 DRs on an average day (more than 200 on today's targets), this is seriously slowing down closures. Jim Jameslwoodward
Are any 500 type errors or timeouts occurring?
(In reply to comment #6) > Are any 500 type errors or timeouts occurring? Not that I know of. See also 35054
Reply to Comment #6: Occasionally, perhaps one in twenty deletes: "API request returned code 504 errorError code is Gateway Time-out" Sometimes just one, sometimes multiple times while doing a multiple delete. Sometimes the process completes correctly after a 504 error; sometimes some of them hang. Five of 13 deletes are hung now at Commons:Deletion requests/The Boy in the Plastic Bubble (film). The other eight are now red-links. It happens more often when more than three or four deletes are running at once. Jim
Is this still occurring?
Yes. I just deleted File:Alter Way 50 x 109.jpg File:Reciprok logo 50 x 109.jpg File:AWF 50 x 109.jpg File:AWS LOGO 50 x 109.jpg [keep] [del] File:AWH 50x109.jpg [keep] [del] After a page reload, the first three are red, the last two are blue, although they have both been deleted. Clicking on File:AWS LOGO 50 x 109.jpg [keep] [del] yields: "No file by this name exists, but you can upload it." while File:AWH 50 x 109.jpg [keep] [del] yields a page that shows the thumbnail and asks "View or restore 4 deleted edits?" ---- On File:Badingchad.jpg [keep] [del] File:Jamesmadara.jpg [keep] [del] File:Lovetriangle.jpg [keep] [del] File:Gakuranboys.jpg [keep] [del] the first one is still blue, but actually deleted. The other three were not deleted. Jim
How does the DelReqHandler script check the return status of the deletions. I assume it uses the API. Did it get a bad response for the files that where not deleted? Would it properly detect if it just got an error message back or treat it as success? In any case there is definitely a cache invalidation problem going on. It could be a missing purge call, an HTMLCacheUpdate job bug, or purge request packet loss. I wish I had a link or screenshot to "File:AWH 50 x 109.jpg" since that one sounds the strangest. Does it also show the file history?
I'm a user, not the programmer, so I can't answer technical questions, but, yes, I'm pretty sure it uses the API. It's a transitory phenomena, so a link won't help, but the next time I see a page like File:AWH 50 x 109.jpg I'll shoot a screen shot and e-mail it to you. --- It occurs to me -- and I'm guessing above my pay grade here -- that the servers appear to have generally slowed down over the last six weeks or so. Is it possible that this is a stack or queue getting overloaded, that would not have been overloaded when the servers were moving much faster in December, so that hitting a bunch of quick (del) would have been cleared before the overload occurred? Certainly even a single (del) takes much longer now than it did then. Jim
(In reply to comment #11) > Would it properly detect if it just got an error message back or treat it as success? The the former one. https://secure.wikimedia.org/wikipedia/commons/wiki/MediaWiki:Gadget-DelReqHandler.js Deletion deleteFile -> doAPICall -> $.ajax -> error/success (would throw an alert-box on error) (both HTTP-status errors and result.error (like bad token) are handled)
I just noticed a similar problem, using the regular interface and while deleting just one file [1]. The deletion took considerable time and delivered following error message: ---- Database error Jump to: navigation, search A database query syntax error has occurred. This may indicate a bug in the software. The last attempted database query was: (SQL query hidden) from within function "LocalFile::delete". Database returned error "1205: Lock wait timeout exceeded; try restarting transaction (10.0.6.41)". ---- Like in the case above, the links to that file remained blue and the at the file description page, the thumbnail along with the file history and the metadata were still to be seen. Purging helped but took also considerable time. Andreas. [1] http://commons.wikimedia.org/wiki/File:Nude_man_standing_with_an_erection..jpg
Assigning to Tim.
The browser version used to view the category page after deletion may be relevant. Anyone want to say what browser version they were using?
Firefox 10.0.1 running on XP Pro Jim
There are a few different issues here. (In reply to comment #0) > On Commons, nowadays we sometimes notice that after a file deletion via API, > the file-revision-table remains visible and links to the files are not rendered > as redlinks. Purging helps but if you delete, you do not expect having to go to > the page and purge it. The slave DB is used to check page_touched in response to an IMS request, so if the file description page looks identical to how it looked before delete, then slave lag coupled with session corruption is a possibility. If the description page had no description text, but the "file history" section was still present, then that would point to an incomplete delete operation. In my test install for a file deletion, I count four separate transactions with write queries in them. The deletion of the image row happens after the deletion of the page row, so it's possible for the page to be deleted before the image, and it's possible for this situation to be user-visible. If everything seemed fine after a few minutes then it was probably just slow rather than broken. (In reply to comment #1) > All 3 files I deleted in the last 5 minutes however are different, something > weird happened (or rather, didn't happen). The category member links were kept > and after refreshing they're still showing up in the category (although the > thumbnail is broken, and opening the link shows a "This page was recently > deleted" message). When an image is deleted, the page_touched for the categories it is in are not updated. It's probably been that way since categories were invented, I confirmed it back to 1.17. The code to do this purge is just not there. This is not related to the other reports. (In reply to comment #5) > After two minutes, the first five files are still highlighted in gray with > "working". The last six are all red. If recent experience is true here, they > will remain "working" indefinitely -- I once left them overnight without > change. Maybe an unexpected result from api.php caused the gadget to throw an exception. In any case, failure to show an informative error message is a gadget problem, not a server problem. > > If I then reload the DR page, several things can happen. > 1) In this case, they all show up red. This is true less than half the time > 2) In most cases some or all will will remain blue. There's a few possible causes of that. > 2a) Then if I go to the file page, in some cases the file will be intact. > 2b) In other cases, the file will appear intact, but the pulldown will start > with "Undelete", showing that the file is actually gone. In that case, using > "Purge" on the pulldown will turn it red on the DR page. The deleted status used by the skin comes from the slave DB, so this could be either slave lag or an incomplete delete operation. (In reply to comment #14) > from within function "LocalFile::delete". Database returned error "1205: Lock > wait timeout exceeded; try restarting transaction (10.0.6.41)". This is the site_stats update, an exception thrown here will cause the memcached file cache update and LocalFile::purgeThumbList() to be skipped. Those two things are done in the same transaction, and LocalFile::purgeThumbList() is quite slow. But I'm not sure if it will explain all of the problems on this bug report. I'll look at dberror.log, it will tell us whether this lock wait timeout is common.
LocalFile::delete DB error counts per day: 23 in Feb 1 7 in Feb 3 1 in Feb 5 3 in Feb 6 6 in Feb 9 26 in Feb 10 1 in Feb 11 2 in Feb 12 91 in Feb 13 7 in Feb 14 5 in Feb 15 7 in Feb 16 1 in Feb 17 4 in Feb 19 96 in Feb 20 27 in Feb 21 9 in Feb 22 48 in Feb 23 90 in Feb 24 72 in Feb 25 180 in Feb 26 13 in Feb 27 59 in Feb 28 28 in Feb 29 29 in Mar 1 33 in Mar 2 30 in Mar 3 2 in Mar 4 13 in Mar 6 3 in Mar 7 5 in Mar 8 10 in Mar 9 3 in Mar 10 4 in Mar 11 9 in Mar 12 8 in Mar 13 4 in Mar 14 56 in Mar 15 23 in Mar 16 38 in Mar 17 50 in Mar 18 216 in Mar 19 49 in Mar 20 2 in Mar 21 5 in Mar 22 Note the low counts around March 8, that implies it's not responsible for the original report.
The logs show lots of lock wait timeouts for other things that write to site_stats, but LocalFile::delete() seems like the most likely root cause. I patched it in r114442, we'll see if that fixes that part of this bug report at least.
It is far too soon to declare victory, but the problem may have gone away. In case that it is not correct, two notes: I would be careful of diagnosing from the counts. There are only half a dozen Admins who close most of the DRs. I know that I have cleared many fewer -- hundreds, perhaps as many as several thousand -- DRs than I would have if this bug were not slowing me down. I would not be surprised if my colleagues have also slowed down or stopped doing DRs. That is certainly evident from the backlog. Note, again, that this a new problem in the last six weeks or eight weeks. I would look for some change around February 1 as the cause. Jim
I have deleted about five hundred files using DelReqHandler this morning with no sign of the trouble detailed above. Thank you very much for fixing this significant problem. Jim
The logs show that total lock wait timeouts have dropped to a small fraction of what they were before r114442.