Last modified: 2014-11-20 09:24:18 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T38587, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 36587 - Chunked upload fails with internal_api_error_UploadStashFileNotFoundException
Chunked upload fails with internal_api_error_UploadStashFileNotFoundException
Status: REOPENED
Product: MediaWiki
Classification: Unclassified
File management (Other open bugs)
unspecified
All All
: High critical with 7 votes (vote)
: ---
Assigned To: Nobody - You can work on this!
http://lists.wikimedia.org/pipermail/...
:
Depends on:
Blocks: chunked-upload
  Show dependency treegraph
 
Reported: 2012-05-07 10:20 UTC by Nemo
Modified: 2014-11-20 09:24 UTC (History)
31 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Nemo 2012-05-07 10:20:49 UTC
See URL for context and more problems. After Firefox, it happened again on Chromium with a 10 times faster upload (~10 min for 190 MB). 
Same as bug 34785 and bug 35354?
Comment 1 Nemo 2012-05-07 10:29:22 UTC
Ah, and if I retry it says "completed!" after a few seconds, but it actually fails again giving "Unknown error: "internal_api_error_UploadStashFileNotFoundException".

Confirmed also by russavia with a 268 MB file on Firefox, Windows (because on Chrome didn't work at all).
Comment 2 Nemo 2012-05-07 10:36:39 UTC
Sorry, russavia had Unknown error: "internal_api_error_UploadChunkFileException" at first.
Comment 3 Erik Moeller 2012-05-09 02:30:23 UTC
At which step of the upload process is this occurring?
Comment 4 Erik Moeller 2012-05-09 06:30:14 UTC
We've reproduced this as a first step error. For very large uploads (~100MB), the final chunk API POST request sometimes (not always) fails with a 504 error from Squid. It seems likely that we have a timing issue with the chunk re-assembly.

Note that this is distinct from bug 34785, which has similar symptoms but occurs in the last step of the upload and is independent of upload size.
Comment 5 Jan Gerber 2012-05-10 09:14:09 UTC
could it also be that re-assembly + hashing of large files just takes to long and hits the php execution time limit?
Comment 6 Erik Moeller 2012-05-25 02:48:05 UTC
I can confirm that this still happens for some very large uploads (just tried a 400MB file), even after we disabled client-side API timeouts. So this looks like a server-side timeout issue in the chunk re-assembly step as Jan suggests.
Comment 7 Fastily 2012-07-29 20:49:33 UTC
Can something be done about this?  I'm getting similar timeouts when I try using the API with a Java application.
Comment 8 Erik Moeller 2012-08-01 01:25:05 UTC
Hi Fastily,

we're currently in the process of moving to a new media storage backend (Swift), which involves lots of changes on all levels (dev and ops), and is the reason we've not prioritized a fix for this yet (we're changing some of the relevant infrastructure, and the people with the right skills to fix this bug are working on the migration). 

We may not have cycles to fully debug the issues with chunking and chunk assembly before September, but Rob should be able to give a better estimate soon (unless someone on CC beats us to it and actually does find time to get to the root of the issue).
Comment 9 Erik Moeller 2012-08-29 22:09:05 UTC
OK, now that we're through most of the Swift migration, we should pick this one up again.

This still occurs and is easily reproducible by uploading a 300-400MB file to Commons via Upload Wizard with chunked uploading enabled.

Looking at the API responses in details, what happens is that there's a final chunk API request which results in a "Wikimedia Error" webpage response like this:

Request: POST http://commons.wikimedia.org/w/api.php, from 208.80.154.134 via cp1002.eqiad.wmnet (squid/2.7.STABLE9) to 10.64.0.125 (10.64.0.125)<br/>
Error: ERR_READ_TIMEOUT, errno [No Error] at Wed, 29 Aug 2012 21:46:36 GMT

Upload Wizard then seems to attempt to re-upload the same chunk again, with the following response code:

{"servedby":"mw65","error":{"code":"internal_api_error_UploadChunkFileException","info":"Exception Caught: error storing file in '\/tmp\/phpRWwfF6': backend-fail-alreadyexists; mwstore:\/\/local-swift\/local-temp\/5\/57\/10tlhjb3zs7o.4nm7i9.28.ogx.490","*":""}

This response code is then surfaced through the UI.

So it looks like the chunk re-assembly for large files is still timing out somewhere.
Comment 10 Aaron Schulz 2012-08-30 22:31:58 UTC
Since concatenation is rare, it doesn't show up usefully in profiling.

I've made a few optimizations:
https://gerrit.wikimedia.org/r/#/c/22063/
https://gerrit.wikimedia.org/r/#/c/22118/

...but I'm not sure how much faster the file operations can be without parallel downloading of local file copies. The slowness may not even be coming from here, I'd need more data to say.

In any case, rather than having the JS expecting to have the whole assembly/upload happen synchronously with the last chunk, it might help if the JS could fallback to polling the server for completion status. Unfortunately this would require a job queue since you can't really give a reply, close the connection, and keep doing work in PHP.
Comment 11 Aaron Schulz 2012-08-30 23:09:37 UTC
More optimizations:
https://gerrit.wikimedia.org/r/#/c/22155/
Comment 12 Fastily 2012-08-31 01:15:53 UTC
This isn't exclusively an UploadWizard issue; I still get the same timeout errors when performing the chunked upload of a 400mb file via API.
Comment 13 Rob Lanphier 2012-08-31 18:40:01 UTC
The changes marked above should roll out Wednesday, September 5 with the 1.20wmf11 deployment.
Comment 14 Erik Moeller 2012-09-08 23:43:44 UTC
It looks like chunked uploading is completely broken now, perhaps due to these changes; see bug 40048.
Comment 15 Erik Moeller 2012-09-10 08:05:38 UTC
Basic chunked uploading is fixed now, thanks Aaron.

Large chunk uploads still fail. The last chunk still leads to a Squid timeout error:

Request: POST http://commons.wikimedia.org/w/api.php, from 208.80.154.134 via cp1015.eqiad.wmnet (squid/2.7.STABLE9) to 10.64.0.125 (10.64.0.125)<br/>
Error: ERR_READ_TIMEOUT, errno [No Error] at Mon, 10 Sep 2012 07:38:26 GMT

Upload Wizard then tries again, and it now fails with a different API error:

{"servedby":"mw67","error":{"code":"stashfailed","info":"Could not read file \"mwstore&#58;\/\/local-swift\/local-temp\/8\/86\/10ukfdjtth84.rb8zv4.28.ogx.0\"."}}
Comment 16 Erik Moeller 2012-09-24 23:25:34 UTC
Aaron/Rob - This issue is still occurring; as soon as there are no higher priority Swift / storage issues remaining, it would be nice if we could dig into it some more.
Comment 17 Fastily 2012-10-02 00:10:17 UTC
Any updates?  I'm still getting the same error :(
Comment 18 Aaron Schulz 2012-10-16 22:32:57 UTC
I tried to upload a 423mb (clone of AW_PT_2010_-_Sérgio_Nunes_-_Uso_da_Wikipédia_para_investigação_em_informática.ogv) file but UW always fails with "Internal error: Something went wrong with processing your upload on the wiki." on one the first chunks...so I can't even hit concatenate there.

Upload ~150mb files seems to work fine though.
Comment 19 Jan Gerber 2012-10-18 11:09:55 UTC
how do you test large uploads on commons? I get a 100Mb upload size limit.
Comment 20 Tomasz W. Kozlowski 2012-10-18 11:13:59 UTC
You need to enable chunked uploads in your Preferences, Jan, and then try to upload the file using UploadWizard.
Comment 21 Jan Gerber 2012-10-18 15:33:18 UTC
Aaron, is it possible that your failed upload is related to a non ascii filename? Have you tried with an ascii only filename and large filesize?

Uploading 430mb file here fails with:

{"servedby":"srv297",
 "error":{
   "code":"stashfailed",
   "info":"Could not read file
\"mwstore&#58;\/\/local-swift\/local-temp\/8\/8c\/10xtcqa7qx6c.c4odgl.1731370.ogx.0\"."
 }
}

thats 'backend-fail-read', so it fails in doConcatenate in
 includes/filebackend/FileBackendStore.php

mwstore:// looks like a virtualSource, so its the second loop, can it be that tmp files are cleaned up between checking them out in the first loop and the second? why is this done in 2 loops?

file ends in 0 so its the first chunk that is missing. $wgUploadStashMaxAge is 6 hours so its unlikely that they get collected at this stage. Any other cleanup things happening that could be the issue?
Comment 22 Jan Gerber 2012-10-18 15:36:56 UTC
followup on the name issue, renaming a file to Wikipédia_para_investigação_em_informática.ogv and trying to upload it, I get this error as first response:

{
 "servedby":"mw73",
 "error":{"code":"internal-error","info":"Invalid file title supplied"}
}
Comment 23 Aaron Schulz 2012-10-18 16:20:45 UTC
(In reply to comment #21)
> Aaron, is it possible that your failed upload is related to a non ascii
> filename? Have you tried with an ascii only filename and large filesize?
> 
> Uploading 430mb file here fails with:
> 

So this is what happens *with* an ascii name I assume?
Comment 24 Andre Klapper 2012-10-18 20:38:32 UTC
Are bug 35354 and bug 40048 duplicates?
Comment 25 Fastily 2012-10-18 20:47:06 UTC
(In reply to comment #24)
> Are bug 35354 and bug 40048 duplicates?

Not quite.  40048 fixed chunked uploading for files slightly exceeding 100mb.  Chunked uploads still outright fails for files exceeding >250Mb
Comment 26 Aaron Schulz 2012-10-18 23:10:32 UTC
I tried renaming the 430mb file and reuploading. I got "Internal error: Server failed to store temporary file.". I also not sure why the name makes a difference since it's not really used until I try to publish (if I managed to get that far).
Comment 27 Jan Gerber 2012-10-19 12:04:36 UTC
1) Filenames: the upload request requires a filename to be passed, the filename needs to be unique and valid, https://gerrit.wikimedia.org/r/#/c/28673/ makes sure its not sending any special characters.

2) Chunk uploads, I changed the chunk size locally (in javascript) to 50Mb but I still get the same error, so the problem is not related to number of chunks. Still get:

{"servedby":"srv254","error":{"code":"stashfailed","info":"Could not read file \"mwstore&#58;\/\/local-swift\/local-temp\/d\/d6\/10xw8uke3gt4.kgux90.1731370.ogx.0\"."}}

3) Using swift on a local vm this problem does not exist. Large uploads do not cause any errors. Dont have a local multiwrite setup, could that be the problem here?
Comment 28 Aaron Schulz 2012-10-19 17:26:56 UTC
(In reply to comment #27)
> 3) Using swift on a local vm this problem does not exist. Large uploads do not
> cause any errors. Dont have a local multiwrite setup, could that be the problem
> here?

I was using ceph rgw. I've tried a 90mb file again and it uploads ~10 chunks and then dies with "Exception Caught: path doesn't exist" (using firebug for inspecting errors). Some of the problems may have to do with http://tracker.newdream.net/issues/3365. But that doesn't quite explain why I it uploads several chunks fine before failing yet.

I can upload 40mb-150mb or so files with swift. Sometimes it says it failed when it succeeded (and I can still publish it) and other times it just works normally.

I'm also running into a bug where most of the messages appear like "[mwe-upwiz-subhead-message][mwe-upwiz-subhead-translate]" on the wikis using sqlite, which is probably unrelated, but annoying problem. This affects my wikis that use ceph and swift.

I also have a wiki using the local FS and MySQL, which seems to give me much less trouble. I'll try MySQL + Swift and see what that does next.
Comment 29 Aaron Schulz 2012-10-19 19:59:47 UTC
Even when it succeeds with swift, I still see dangling HTTP requests that last seemingly forever in firebug. https://gerrit.wikimedia.org/r/#/c/28286/ might help, but I'm not sure were the hanging is internally.
Comment 30 Aaron Schulz 2012-10-23 17:19:42 UTC
Fixed for ceph made in https://gerrit.wikimedia.org/r/#/c/29421/. It works on the level of swift now.
Comment 31 prolineserver 2012-11-10 13:25:49 UTC
Any updates here? I still get the "internal_api_error_UploadStashFileNotFoundException" :(
Comment 32 Andre Klapper 2012-11-19 13:21:12 UTC
Can users somehow debug this or provide more info, if it's reproducible for them?

Yet another report in the feedback forum: https://commons.wikimedia.org/w/index.php?title=Commons:Upload_Wizard_feedback&oldid=83274547#Upload_error
Comment 33 Nemo 2012-11-19 13:25:43 UTC
(In reply to comment #32)
> Can users somehow debug this or provide more info, if it's reproducible for
> them?

I doubt so: it's hard to debug even for Aaron! ;-)
What users can see is that, whatever the upload method is, big chunked uploads are likely to fail.
Comment 34 Aaron Schulz 2012-11-19 19:47:08 UTC
More improvements in https://gerrit.wikimedia.org/r/#/c/33978/ (which will still be slow).

This also got worse last week with multiwriting of temp files to nas1 in addition to swift...

More radical changes proposed in https://gerrit.wikimedia.org/r/#/c/34062/ to eliminate slow HTTP requests entirely.

Some debug timing from test2wiki:
2012-11-18 09:30:20 srv293 test2wiki: Finished concat of 242 chunks in 33.490297794342 sec.
2012-11-18 09:31:07 srv293 test2wiki: Finished stash of 242 chunked file in 47.260545969009 sec.

2012-11-18 10:39:27 srv295 test2wiki: Finished concat of 403 chunks in 50.59726691246 sec.
2012-11-18 10:40:42 srv295 test2wiki: Finished stash of 403 chunked file in 75.486596107483 sec.

...this is not counting the massive and slow extra GET request eliminated in https://gerrit.wikimedia.org/r/#/c/33978/.
Comment 35 Andre Klapper 2012-12-17 16:05:49 UTC
(In reply to comment #34)
> More radical changes proposed in https://gerrit.wikimedia.org/r/#/c/34062/ to
> eliminate slow HTTP requests entirely.

That code change was merged 10 days ago and deployed on December 12th.
I'm curious if the situation has improved. Comments on this report are welcome by people that were previously affected!
Plus we probably have to watch https://commons.wikimedia.org/wiki/Commons:Upload_Wizard_feedback and see...
Comment 36 Jan Gerber 2012-12-17 17:02:18 UTC
note that the change in UploadWizard was only merged December 15 and is not deployed so far. so just testing UW right now will give the same result.

 https://gerrit.wikimedia.org/r/#/c/34537/


In addition there is another change in gerrit that is not merged yet preventing timeouts in the upload-from-stash step:

core: https://gerrit.wikimedia.org/r/#/c/36697/
UW:   https://gerrit.wikimedia.org/r/#/c/36768/
Comment 37 Tomasz W. Kozlowski 2013-01-05 23:34:38 UTC
Note that this bug is still in place, I am getting the very same error message as Nemo_bis for files above 150 MiB; I just tried uploading a 170 MiB video file without any success.

Can we please get this fixed? It's so annoying that even though there is the raised 500 MiB limit for files uploaded with the UploadWizard, one cannot actually make any use of it due to this bug.
Comment 38 Nemo 2013-01-08 17:47:03 UTC
After I8cfcb09d , some of us have been able to upload a big file for the first time: thanks! 
In particular we uploaded two videos of 170 and 200 MB via UploadWizard:
https://commons.wikimedia.org/wiki/File:2013-01-05_President_Obama's_Weekly_Address.ogv
https://commons.wikimedia.org/wiki/File:Communication_issues_musings_of_a_dinosaur.ogv

Apart from bug 36599, the only problem I had was that it didn't manage to produce the thumbnail (nor in first step nor later) and that the final publishing phase took way longer than usual.
I still have to try with bigger files.
Comment 39 Nemo 2013-01-08 22:01:39 UTC
I also managed to upload a 500 MB video: https://commons.wikimedia.org/wiki/File:Meet_John_Doe.ogv

I encountered bug 43746, then got "An unknown error occurred" as said there, then upon "retry failed uploads" a api-error-internal_api_error_UploadStashFileNotFoundException error message, but the file was actually uploaded.
(I discovered only now that I tried to reupload it and got "A file with this name exists already" in "Describe" step.)
Comment 40 Aaron Schulz 2013-01-10 01:26:26 UTC
(In reply to comment #36)
> note that the change in UploadWizard was only merged December 15 and is not
> deployed so far. so just testing UW right now will give the same result.
> 
>  https://gerrit.wikimedia.org/r/#/c/34537/
> 
> 
> In addition there is another change in gerrit that is not merged yet
> preventing
> timeouts in the upload-from-stash step:
> 
> core: https://gerrit.wikimedia.org/r/#/c/36697/
> UW:   https://gerrit.wikimedia.org/r/#/c/36768/

Merged and deployed now.
Comment 41 Erik Moeller 2013-01-10 02:18:29 UTC
I just tried a 470MB test file and it failed with "Unknown error:unknown" on first step. I suspect it ended up hitting line 217 in mw.FormDataTransport.js:

 //If concatenation takes longer than 3 minutes give up
                    if ( ( ( new Date() ).getTime() - _this.firstPoll ) > 3 * 60 * 1000 ) {
                        _this.transportedCb({
                            code: 'server-error',
                            info: 'unknown server error'
                        });

Is three minutes sufficient time? Are there other things we should do to speed up the concatenation step?
Comment 42 Aaron Schulz 2013-01-10 03:20:43 UTC
(In reply to comment #41)
> Is three minutes sufficient time?

Not really, it should be increased, say to 5 (and more as needed, though I at some point it will get kind of unreasonable without more ui feedback).

Are there other things we should do to
> speed
> up the concatenation step?

Increasing the chunk size would help somewhat. Perhaps pipelining the chunks would help (though the db layout does not support that, they must come in order). Disabling multiwrite would speed up chunks storage and final file stashing by 1.5X or so.
Comment 43 Nemo 2013-01-15 23:20:08 UTC
To both me (Chromium) and odder (?), upload of <https://archive.org/download/Plan_9_from_Outer_Space_1959/Plan_9_from_Outer_Space_1959.ogv> (372 MiB) is failing in "Upload" step with «Unknown error: "unknown".»
Comment 44 Erik Moeller 2013-01-16 05:18:39 UTC
Gah, same here. :-( It's now aborting immediately at the first API request that returns the "queued" result. (Chrome 22.)
Comment 45 Jan Gerber 2013-01-16 05:24:40 UTC
if you run it with the console open whats the last response from the server in the network tab?
Comment 46 Erik Moeller 2013-01-16 05:33:18 UTC
Second try, it fails as before with the last API requests all returning result:Poll,stage:queued (including the final response), until Upload Wizard reports the "unknown" error, presumably due to triggering the aforementioned timeout. This is with a 125MB file.

Not sure it really behaved differently before, will do some more testing.
Comment 47 Erik Moeller 2013-01-16 06:17:00 UTC
Whatever is going wrong in the assembly stage, it doesn't look like the slowdown is linear. With a 22MB file the assembly succeeds almost instantaneously after the first API poll. With a 30MB file, it's two poll requests. With a 125MB file, I have more than 50 polls before it finally times out.
Comment 48 Fastily 2013-02-03 23:16:15 UTC
Any updates?
Comment 49 Erik Moeller 2013-02-06 01:43:31 UTC
The last uploads I tried all succeeded. Could others following this please try again and see if you can successfully upload >100MB files through Upload Wizard with the chunked upload preference enabled?
Comment 50 Nemo 2013-02-06 17:12:27 UTC
Your file was 120 MB, I've tried a 370 MiB video and it failed again (I'm now retrying). Too bad, because it seemed also fast enough, averaging around 300-400 KiB/s and oscillating in 100-800 interval.
Comment 51 Erik Moeller 2013-02-06 18:19:27 UTC
Trying with a 344M file I get the good old Unknown error: "internal_api_error_UploadStashFileNotFoundException" again. Note that it doesn't appear to be doing the asynchronous polling any more -- the final chunk is uploaded and fails with an error 504 - gateway timeout response.

It looks like increasing chunk size to 5MB may have helped somewhat but not sufficiently for very large files.
Comment 52 Erik Moeller 2013-02-06 20:36:06 UTC
We (Jan/RobLa/Aaron/myself) connected about this earlier today. It looks like part of the problem is preserving the request context (user/IP) in a sane manner when shelling out for asynchronous assembly of the chunks / uploading the file from stash. Jan wants to take a first crack at resolving this w/ Aaron's help. In addition the server-side thumbnail generation for Ogg files currently doesn't scale for large files and needs to be re-implemented using range requests. (Jump in if I got any of that wrong.)

Hopefully we can make some further progress on this in the next couple of weeks.
Comment 53 Smallman 2013-02-23 22:15:56 UTC
I'm getting this error roughly once ever several thousand files (~10-20MB) that are being chunk-uploaded via commons API in 2MB-3MB chunks:

{"servedby":"mw1138","error":{"code":"internal_api_error_UploadChunkFileException","info":"Exception Caught: error storing file in '\/tmp\/php2BDowP': backend-fail-internal; local-swift","*":""}}

Looks like something isn't being allocated/locked properly possibly a rare race condition. It's annoying.
Comment 54 Erik Moeller 2013-02-26 17:41:38 UTC
Am I right that this is mainly waiting for this changeset to be merged, or are there other dependencies at this point?

https://gerrit.wikimedia.org/r/#/c/48940/
Comment 55 Nischay Nahata 2013-03-02 20:14:26 UTC
Changeset merged, is this fixed now?
Comment 56 Nemo 2013-03-03 22:53:50 UTC
(In reply to comment #39)
> then upon "retry failed uploads" a
> api-error-internal_api_error_UploadStashFileNotFoundException error message,
> but the file was actually uploaded.

This happened again with http://commons.wikimedia.org/wiki/File:Scrooge_1935.ogv uploaded by Beria (300 MB in 12 min).
Comment 57 Erik Moeller 2013-03-12 01:41:38 UTC
I'm having mixed success with the latest code. A 459M file seemed to work fine (I didn't go past stage 1). A 491M file I just tried resulted in the following API request sequence:

5MB chunk->ok
5MB chunk->ok
5MB chunk->ok
...
lots of chunks later
...
~500K (final) chunk->Error 504
Retry of ~500K final chunk->API error.

The final API error was:

{"servedby":"mw1194","error":{"code":"stashfailed","info":"Invalid chunk offset"}}

Surfaced to the user as "Internal error: Server failed to store temporary file".
Comment 58 Aaron Schulz 2013-03-15 22:01:34 UTC
(In reply to comment #57)
> I'm having mixed success with the latest code. A 459M file seemed to work
> fine
> (I didn't go past stage 1). A 491M file I just tried resulted in the
> following
> API request sequence:
> 
> 5MB chunk->ok
> 5MB chunk->ok
> 5MB chunk->ok
> ...
> lots of chunks later
> ...
> ~500K (final) chunk->Error 504
> Retry of ~500K final chunk->API error.
> 
> The final API error was:
> 
> {"servedby":"mw1194","error":{"code":"stashfailed","info":"Invalid chunk
> offset"}}
> 
> Surfaced to the user as "Internal error: Server failed to store temporary
> file".

No async upload was enabled at that time (it is behind a feature flag). Since all wikis were on wmf11, I deployed the new redis queue aggregator on Thursday, which worked fine. Async uploads were enabled again then. The existing high priority loop made via puppet config changes was already done and appears to work as desired. The new code to fix the IP logging issue was broken by CentralAuth, which, that caused upload job to fail. This was fixed in https://gerrit.wikimedia.org/r/#/c/54084/. It can be tested at test2wiki (jobs on testwiki are broken due to srv193 being in pmtpa, so don't use that).
Comment 59 Aaron Schulz 2013-03-25 18:27:24 UTC
(In reply to comment #58)
> No async upload was enabled at that time (it is behind a feature flag).

Obviously meant "disabled".
Comment 60 Tilman Bayer 2013-06-07 15:50:03 UTC
I don't know whether it was caused by the exact same API error as in this bug, but I just got the above UploadWizard error message when trying to upload a 231MB file (twice, on Chromium and Firefox):

"Internal error: Server failed to store temporary file."

Clicking "Retry failed uploads" in Chromium resulted in "Unknown error: 'unknown'", but on Firefox it succeeded in completing the upload.
Comment 61 Fastily 2013-06-08 22:00:44 UTC
(In reply to comment #60)
> I don't know whether it was caused by the exact same API error as in this
> bug,
> but I just got the above UploadWizard error message when trying to upload a
> 231MB file (twice, on Chromium and Firefox):
> 
> "Internal error: Server failed to store temporary file."
> 
> Clicking "Retry failed uploads" in Chromium resulted in "Unknown error:
> 'unknown'", but on Firefox it succeeded in completing the upload.


Confirmed, this is totally broken again.  Why is this broken again?
Comment 62 Andre Klapper 2013-06-09 00:25:01 UTC
Fastily: If you can confirm it, providing some basic info would be very welcome (file size, browser, etc.). Thanks!
Comment 63 Fastily 2013-06-17 22:46:35 UTC
Certainly.  Every few big uploads, I get a generic HTTP 500 error.  Also, I'm not sure if it's related, but I also get the occasional error in which the server claims it can't reassemble the chunks.  Neither of these errors really occur when I'm editing on a corporate network with some 60+ mbps upload speeds, but when I'm at home, I get an average of 5mbps upload.  That said, I suspect something is timing out server-side.

I used a variety of test files, ranging from 152-450 Mb, using my Java library to upload the files via the MediaWiki API.
Comment 64 Bawolff (Brian Wolff) 2013-07-08 05:18:37 UTC
When I tested on test2.wikipedia.org - I was able to upload a small file fine. However a large (200 mb range, don't remember the exact size) file split into about 400 chunks ended up with me just getting result: poll; stage: queued forever and ever (Well actually I gave up after about 2 and a half hours of waiting).

>I get a generic HTTP 500 error

Just for reference, wikimedia's 500 errors usually contain debugging information near the bottom (unless they've changed).
Comment 65 Fastily 2013-07-12 22:33:48 UTC
Is anything being done to resolve this issue at the moment?

I'd suggest debugging with a 350mb file but throttling upload speed to ~0.5mbps.  Each time I did this it failed without exception.
Comment 66 Bawolff (Brian Wolff) 2013-07-14 04:07:06 UTC
> 
> I used a variety of test files, ranging from 152-450 Mb, using my Java
> library
> to upload the files via the MediaWiki API.

Is your java library using the async option when uploading these files?

What stage does the 500 error usually occur at? (While uploading a chunk, Some point during the "assembling" stage or some point during the "publish" stage? Or does it vary).
Comment 67 Fastily 2013-07-22 20:37:44 UTC
(In reply to comment #66)
> > 
> > I used a variety of test files, ranging from 152-450 Mb, using my Java
> > library
> > to upload the files via the MediaWiki API.
> 
> Is your java library using the async option when uploading these files?
> 
> What stage does the 500 error usually occur at? (While uploading a chunk,
> Some
> point during the "assembling" stage or some point during the "publish" stage?
> Or does it vary).

I believe we are using the async option when uploading.

The 500 error typically occurs at the publishing stage.  I've had similar, but infrequent 500 errors at the assembling stage as well, but I'm not sure how related this is.
Comment 68 Kelson [Emmanuel Engelhart] 2013-11-11 15:58:32 UTC
Now, that we have increased the UploadWizard limit to 1GB, the frequency of this error will probably increase. Last report I have read is about two consecutive uploads of a 800 MB video (with FF and chrome) which both failed with a "stasherror": https://bugzilla.wikimedia.org/show_bug.cgi?id=52593#c9
Comment 69 Fastily 2013-11-12 02:22:09 UTC
I do hope this is fixed soon.  I commented about it here: https://bugzilla.wikimedia.org/show_bug.cgi?id=52593#c13
Comment 70 Fastily 2013-11-19 02:05:39 UTC
New update -- It looks like big files which 'failed to upload' are visible at [[Special:UploadStash]].  I'm unable to download & verify the contents of
those files however, because the system "Cannot serve a file larger than 1048576 bytes."
 Given this, it's hard to say what kind of issue this is (e.g. maybe the
uploaded file is corrupt, i.e. file was not assembled properly server-side?)
Comment 71 Bawolff (Brian Wolff) 2013-11-19 02:50:47 UTC
(In reply to comment #70)
> New update -- It looks like big files which 'failed to upload' are visible at
> [[Special:UploadStash]].  I'm unable to download & verify the contents of
> those files however, because the system "Cannot serve a file larger than
> 1048576 bytes."
>  Given this, it's hard to say what kind of issue this is (e.g. maybe the
> uploaded file is corrupt, i.e. file was not assembled properly server-side?)

Yes, we currently don't let people download things that are in the upload "stash" if they are bigger than 1 mb. If it is of interest, the reason given in the code for this is:

        // Since we are directly writing the file to STDOUT,
        // we should not be reading in really big files and serving them out.
        //
        // We also don't want people using this as a file drop, even if they
        // share credentials.
        //
        // This service is really for thumbnails and other such previews while
        // uploading.

You should be able to verify if the upload worked by requesting a thumbnail that would be smaller than 1 mb. If it was a jpeg file, with a stash name of 11oedl0sn7e4.aggjsr.1.jpg , then a url of Special:UploadStash/thumb/11oedl0sn7e4.aggjsr.1.jpg/120px-11oedl0sn7e4.aggjsr.1.jpg should work. If its a video file named 11oedl0sn7e4.aggjsr.1.webm, then Special:UploadStash/thumb/11oedl0sn7e4.aggjsr.1.webm/100px--11oedl0sn7e4.aggjsr.1.webm.jpg would get you a thumbnail if the file is not corrupt (I think, haven't tested that for a video)
----

>Given this, it's hard to say what kind of issue this is (e.g. maybe the
>uploaded file is corrupt, i.e. file was not assembled properly server-side?)

I wonder if some sort of timeout/race condition happened with the screwy way we store data in the session, and maybe the file is uploaded fine, but the publish step (i.e. The step moving file from stash to actually on-wiki) never really happened due to timeout. If that was the case, it may be possible to do a further API request after the fact to finish the upload.
Comment 72 Bawolff (Brian Wolff) 2013-11-19 05:21:57 UTC
> 
> >Given this, it's hard to say what kind of issue this is (e.g. maybe the
> >uploaded file is corrupt, i.e. file was not assembled properly server-side?)
> 
> I wonder if some sort of timeout/race condition happened with the screwy way
> we
> store data in the session, and maybe the file is uploaded fine, but the
> publish
> step (i.e. The step moving file from stash to actually on-wiki) never really
> happened due to timeout. If that was the case, it may be possible to do a
> further API request after the fact to finish the upload.

Meh, looks like the individual chunks get listed to, so hard to tell what that means.

Also, looks like the thumbnailing infrastructure around stashed upload is totally broken on wmf wikis. Presumably it was forgotten about in the swift migration(?) Not that surprising, since I'm not sure if anyone

----

Because Special:Upload is kind of useless... I made some (very hacky) js that will add some additional links. It adds a (broken) link to a thumbnail. It adds a link to metadata, and it adds a publish link, to take a file out of the stash and on to the wiki.

In particularly, the metadata link includes the file size in bytes, which you can use to verify that all the parts of the file made it. If you want to be more paranoid, it also returns an SHA1 sum of the file, so you can be sure its really the right file on the server.

If that matches up, try the publish link and see what happens...
----

Anyhow, to sum up, add
 importScript( 'User:Bawolff/stash.js' );
to [[commons:Special:MyPage/common.js]], and you should have the extra link on [[commons:Special:UploadStash]] which you can use to verify what file is in the stash.
Comment 73 Rainer Rillke @commons.wikimedia 2013-11-20 01:13:08 UTC
> You should be able to verify if the upload worked by requesting a thumbnail
> that would be smaller than 1 mb. If it was a jpeg file, with a stash name of
> [...]

or you simply try

https://commons.wikimedia.org/wiki/Special:UploadStash?withJS=MediaWiki:EnhancedStash.js
Comment 74 Rainer Rillke @commons.wikimedia 2013-11-20 01:28:40 UTC
(In reply to comment #72)
> importScript( 'User:Bawolff/stash.js' );

Ha! Didn't notice that. Ever wanted to write something like that and now we have 2 of them.

(In reply to comment #71)
> I think, haven't tested that for a video

Video works *but* the generated "thumbnail" (for me https://commons.wikimedia.org/wiki/Special:UploadStash/thumb/11vmdxqgjy9o.2239xy.1173692.webm/120px--11vmdxqgjy9o.2239xy.1173692.webm.jpg) is in full video size (here 1920x1080px).
Comment 75 Bawolff (Brian Wolff) 2013-11-20 02:31:08 UTC
(In reply to comment #74)
> (In reply to comment #72)
> > importScript( 'User:Bawolff/stash.js' );
> 
> Ha! Didn't notice that. Ever wanted to write something like that and now we
> have 2 of them.

Cool. Yours is about a billion times better than my hack.


> 
> (In reply to comment #71)
> > I think, haven't tested that for a video
> 
> Video works *but* the generated "thumbnail" (for me
> https://commons.wikimedia.org/wiki/Special:UploadStash/thumb/11vmdxqgjy9o.
> 2239xy.1173692.webm/120px--11vmdxqgjy9o.2239xy.1173692.webm.jpg)
> is in full video size (here 1920x1080px).

Interesting. When I tried I was getting squid 503 errors all over the place (both for videos and normal images)
Comment 76 Gilles Dubuc 2014-01-10 14:16:00 UTC
(In reply to comment #67)
> (In reply to comment #66)
> > > 
> > > I used a variety of test files, ranging from 152-450 Mb, using my Java
> > > library
> > > to upload the files via the MediaWiki API.
> > 
> > Is your java library using the async option when uploading these files?
> > 
> > What stage does the 500 error usually occur at? (While uploading a chunk,
> > Some
> > point during the "assembling" stage or some point during the "publish" stage?
> > Or does it vary).
> 
> I believe we are using the async option when uploading.
> 
> The 500 error typically occurs at the publishing stage.  I've had similar,
> but
> infrequent 500 errors at the assembling stage as well, but I'm not sure how
> related this is.

I'd like to clarify this a bit. Your main issue is a 500 that happens at the publishing stage, is that correct? I think that this ticket has actually talked about several different bugs over time, which makes things more confusing than they need to be. I'd like to treat the assembly stage errors separately, I'm more interested in the one that's causing you issues the most frequently.

Are there any more specific errors in the header or body of the 500 response?
Comment 77 Bawolff (Brian Wolff) 2014-01-10 19:03:32 UTC
As an aside, splitting comment 64 to bug 59917
Comment 78 Greg Grossmeier 2014-06-24 17:23:26 UTC
Gilles: I'm resetting assignee for now. Should the priority be lowered as well (there hasn't been any movement/communication (either direction) since January)?

Fastily, do you have a reply for Gille's question below?

Gilles: you asked(In reply to Gilles Dubuc from comment #76)
> (In reply to comment #67)
> > (In reply to comment #66)
> > > > 
> > > > I used a variety of test files, ranging from 152-450 Mb, using my Java
> > > > library
> > > > to upload the files via the MediaWiki API.
> > > 
> > > Is your java library using the async option when uploading these files?
> > > 
> > > What stage does the 500 error usually occur at? (While uploading a chunk,
> > > Some
> > > point during the "assembling" stage or some point during the "publish" stage?
> > > Or does it vary).
> > 
> > I believe we are using the async option when uploading.
> > 
> > The 500 error typically occurs at the publishing stage.  I've had similar,
> > but
> > infrequent 500 errors at the assembling stage as well, but I'm not sure how
> > related this is.
> 
> I'd like to clarify this a bit. Your main issue is a 500 that happens at the
> publishing stage, is that correct? I think that this ticket has actually
> talked about several different bugs over time, which makes things more
> confusing than they need to be. I'd like to treat the assembly stage errors
> separately, I'm more interested in the one that's causing you issues the
> most frequently.
> 
> Are there any more specific errors in the header or body of the 500 response?
Comment 79 Sisa 2014-07-10 07:50:35 UTC
Using the bigChunkedUpload.js to upload a new version (345.142KB) of https://commons.wikimedia.org/wiki/File:Clusiodes_-_2014-07-06kl.AVI.webm I got the message " FAILED: {"servedby":"mw1190","error":{"code":"stasherror","info":"UploadStashFileNotFoundException: key '12fc8few9krk.lhij8x.957461.webm' not found in stash"}}. This error occurred after uploading 82 of 85 chunks. I have to use a 384 kbit/s connection, so bigger uploads need several hours.
Comment 80 Andre Klapper 2014-07-10 12:17:41 UTC
Whatever "bigChunkedUpload.js" is, this bug report is about UploadWizard instead...
Comment 81 Rainer Rillke @commons.wikimedia 2014-07-10 12:47:35 UTC
(In reply to Andre Klapper from comment #80)
> Whatever "bigChunkedUpload.js" is, this bug report is about UploadWizard 
> instead...

This bug is about an issue with chunked uploading and thus belongs to either Wikimedia or MediaWiki file management.

bigChunkedUpload.js is a standard-compliant script written by me and the error message is what it got back by the API.
Comment 82 Rainer Rillke @commons.wikimedia 2014-07-10 13:28:36 UTC
(In reply to Sisa from comment #79)
Sisa, do you remember
1) how long it took uploading the 82 chunks
2) when you were attempting to upload (date+time+timezone or just in UTC)

Did your re-try?
Comment 83 Sisa 2014-07-10 14:03:54 UTC
(In reply to Rainer Rillke @commons.wikimedia from comment #82)
> (In reply to Sisa from comment #79)
> Sisa, do you remember
> 1) how long it took uploading the 82 chunks
> 2) when you were attempting to upload (date+time+timezone or just in UTC)
> 
> Did your re-try?

Sorry, I can not answer your questions exactly. I started the upload yesterday at (about) 17h here in Germany (UTC+2) and I went to bed at about 1.30h today in the morning. As far as I remember about 70 chunks were uploaded (without any error) at this time. I will try it again next night...
Comment 84 Sisa 2014-07-11 11:08:53 UTC
(In reply to Rainer Rillke @commons.wikimedia from comment #82)

Also the retry ended up unsuccessfully. I started it at 0.43h (UTC+2) and all chunks were uploaded (87 of 87, chunk size: 4096 KiB, duration: 36558s). However the server sided rebuilding of the new file is hanging ("44552: finalize/87> Still waiting for server to rebuild uploaded file" and so on...)
Comment 85 Rainer Rillke @commons.wikimedia 2014-07-11 11:25:30 UTC
(In reply to Sisa from comment #84)
If this particular file matters for you, you can try publishing it from your upload stash, if it's still in: https://commons.wikimedia.org/w/index.php?title=Special:UploadStash&withJS=MediaWiki:EnhancedStash.js

----
I notice that files are removed from stash quite frequently now ... could this cause any harm?
Comment 86 Sisa 2014-07-11 12:50:56 UTC
How can I do this? Openimg the file in a new tab of my browser (SeaMonkey 2.26.1) brings the message "Internal Server Error Cannot serve a file larger than 1048576 bytes."
Comment 87 Rainer Rillke @commons.wikimedia 2014-07-11 12:53:43 UTC
(In reply to Sisa from comment #86)
Is there a "publish" button? Try that. If it doesn't let you use the desired destination file name, we can move it later to where it should go.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links