Last modified: 2014-10-10 16:32:53 UTC
error code: backend-fail-internal error info: An unknown error occurred in storage backend "local-swift-eqiad" Reported at https://commons.wikimedia.org/wiki/Commons:Administrators%27_noticeboard#Serious_deletion_error_issue
This bug is causing about 1/3 of my attempts to delete files to fail. I then have to refresh my browser before I can finally get the files to delete, especially in mass DRs or nukes. INeverCry
The same error occurs during file deletions on the German Wikipedia, see [0]. Error message: Fehler bei Datei-Löschung: Im Speicher-Backend „local-swift-eqiad“ ist ein unbekannter Fehler aufgetreten. [0] <https://de.wikipedia.org/wiki/Wikipedia:Administratoren/Anfragen#Probleme_beim_L.C3.B6schen_von_Dateien>
See: https://commons.wikimedia.org/w/index.php?title=Commons:Administrators%27_noticeboard&oldid=132110902#Serious_deletion_error_issue
This is actuall an urgent issue, it also affects uploads where images or file description pages get corrupted. Is nobody of the tech team alerted by (hopefully existing) automatic error messages ?
There are unresolved prio bugs in the "Media storage" component. Swift is a vital component of the projects' ability to show images and other media, and it having so many open bugs causes serious ongoing issues, not only on Commons, but everywhere.
See Screenshot in German Wikipedia: https://de.wikipedia.org/wiki/Datei:Screenshot_Fehler_im_Speicher-Backend.png This is an urgent issue.
*** Bug 69717 has been marked as a duplicate of this bug. ***
Also brought up in https://de.wikipedia.org/w/index.php?title=Wikipedia:Technik/Werkstatt&oldid=133262498#Probleme_bei_Datei-L.C3.B6schungen
<godog> it is running a bit hot on bandwidth from/to the upload caches but shouldn't be too bad, not sure exactly what mw does when talking to swift <godog> all that load comes artificially from ms-be1003 having xfs in a funny state !log reboot ms-be1003, xfs errors/panics
that (rebooting ms-be1003) did it, the proxy mentioned ERRORS and timeouts towards ms-be1003 while attempting to DELETE, which would explain the symptoms. can you try again and see if it works? thanks!
Still getting a bunch of these same errors as I try deletions here on Commons: API request failed (backend-fail-internal): An unknown error occurred in storage backend "local-swift-eqiad". <i>at Wed, 20 Aug 2014 17:42:39 GMT</i> <u>served by mw1119</u>
Observed the same at Commons, no improvement seen.
API request failed (backend-fail-internal): An unknown error occurred in storage backend "local-swift-eqiad". <i>at Wed, 20 Aug 2014 21:26:31 GMT</i> <u>served by mw1132</u>
I just deleted 200+ files from Commons with no errors.
Not sure if this is related, but uploads have been failing with a similar message: {"error":{"0":["backend-fail-internal","local-swift-eqiad"],"code":"internal-error","info":"An internal error occurred"},"servedby":"mw1202"}
The problem in comment 9 is clearly visible in ganglia. Don't see any obvious more recent issues on the same ganglia graphs. https://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&m=cpu_report&s=by+name&c=Swift+eqiad&h=&host_regex=&max_graphs=0&tab=m&vn=&hide-hf=false&sh=1&z=small&hc=4 (may need to adjust time period at the top depending on when you click the link)
(In reply to Fastily from comment #15) > Not sure if this is related, but uploads have been failing with a similar > message: btw, please provide timestamps for when the errors happened if you have them! (e.g. comments 13/15)
same here, I can't see any obvious issues with swift after rebooting the machine that was causing the high load yesterday. we are doing some tuning to the nagios alerts we get for swift to detect reoccurence (and a root cause/fix too!)
2014-08-21T13:57Z the bug strikes back! API request failed (backend-fail-delete): Could not delete file "mwstore://local-swift-eqiad/local-public/c/ce/Крушение_поезда_в_московском_метро_15.07.2014.jpg"
(In reply to Pierre-Selim from comment #19) > 2014-08-21T13:57Z the bug strikes back! > > API request failed (backend-fail-delete): Could not delete file > "mwstore://local-swift-eqiad/local-public/c/ce/ > Крушение_поезда_в_московском_метро_15.07.2014.jpg" Same file, slightly different error message. Error deleting file: Could not delete file "mwstore://local-swift-eqiad/local-public/c/ce/Крушение_поезда_в_московском_метро_15.07.2014.jpg".
there were further errors found with swift talking to memcached, I've pushed https://gerrit.wikimedia.org/r/#/c/155629/ to bump that limit, the timeouts are now greatly reduced, not completely eliminated yet though but the impact should be a lot less
*** Bug 69875 has been marked as a duplicate of this bug. ***
(In reply to jeremyb from comment #17) > (In reply to Fastily from comment #15) > > Not sure if this is related, but uploads have been failing with a similar > > message: > > btw, please provide timestamps for when the errors happened if you have > them! (e.g. comments 13/15) Unfortunately I don't have an exact timestamp, but I do know this was happening during the same time deletions were failing. I haven't tried uploading anything since. Will definitely try again sometime this weekend.
So I've done quite a number of uploads and deletions since I lasted posted here, and have not experienced a 'backend-fail-internal' error since. I'm going to go ahead and close this as resolved for now. If anyone else is still experiencing errors, please don't hesitate to reopen! :)
Issue reappeared on [[commons:File:Pheliperodrigues.jpg]] Error deleting file: Could not delete file "mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg".
misc data points: I'm seeing some attempts in filebackend-ops.log: 2014-09-01 13:42:52 mw1210 commonswiki: MoveFileOp failed (batch #750loigffakv97vzttctb06d3xb1nf6): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":false,"failedAction":"attempt"} 2014-09-01 13:43:20 mw1198 commonswiki: MoveFileOp failed (batch #750loighcfplahx48bnr125t45twh4z): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":true,"failedAction":"attempt"} 2014-09-01 13:45:49 mw1104 commonswiki: MoveFileOp failed (batch #750loignpnz38ysz6rjotgwq7h5i1os): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":true,"failedAction":"attempt"} 2014-09-01 13:45:50 mw1119 commonswiki: MoveFileOp failed (batch #750loignq7eycehvi3b13ykg1ji76su): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":true,"failedAction":"attempt"} 2014-09-01 13:46:10 mw1187 commonswiki: MoveFileOp failed (batch #750loigpdo6255p9lg4z3w3826strm2): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":true,"failedAction":"attempt"} 2014-09-01 13:47:00 mw1150 commonswiki: MoveFileOp failed (batch #750loigrwlc8cwz29zgmbns0fe5df9l): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":true,"failedAction":"attempt"} 2014-09-01 13:48:07 mw1175 commonswiki: MoveFileOp failed (batch #750loiguvhuma8fk8395oakjvgzq66o): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":true,"failedAction":"attempt"} 2014-09-01 13:52:58 mw1183 commonswiki: MoveFileOp failed (batch #750loih7eo5eju3z0i7mm7gdued1lxp): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":true,"failedAction":"attempt"} 2014-09-01 13:53:04 mw1073 commonswiki: MoveFileOp failed (batch #750loih8ndhfheqxmjb5qpvokp2p452): {"src":"mwstore://local-swift-eqiad/local-public/9/97/Pheliperodrigues.jpg","dst":"mwstore://local-swift-eqiad/local-deleted/q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg","overwriteSame":true,"dstExists":true,"failedAction":"attempt"} and the hashed file seems to be already there: # swift list wikipedia-commons-local-deleted.q5 | grep q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg q/5/q/q5qea4gleglvwbotppyd5fq5jnye5zz.jpg
though no match for that file in swift-backend.log: $ zgrep -i Pheliperodrigues.jpg swift-backend.log archive/swift-backend.log-20140901.gz archive/swift-backend.log-201408* $ seemingly a different (but related?) issue
Looks like INeverCry finally succeed in deleting that file.
Is the problem described in comment 25 to comment 27 still seen?