Last modified: 2011-03-29 08:41:38 UTC

Wikimedia Bugzilla is closed!

Wikimedia migrated from Bugzilla to Phabricator. Bug reports are handled in Wikimedia Phabricator.
This static website is read-only and for historical purposes. It is not possible to log in and except for displaying bug reports and their history, links might be broken. See T27676, the corresponding Phabricator task for complete and up-to-date bug report information.
Bug 25676 - Deploy FirefoggChunkedUpload extension to Wikimedia wikis
Deploy FirefoggChunkedUpload extension to Wikimedia wikis
Status: RESOLVED WONTFIX
Product: Wikimedia
Classification: Unclassified
Extension setup (Other open bugs)
unspecified
All All
: Normal enhancement (vote)
: ---
Assigned To: Nobody - You can work on this!
:
Depends on:
Blocks: 27315
  Show dependency treegraph
 
Reported: 2010-10-27 19:31 UTC by Michael Dale
Modified: 2011-03-29 08:41 UTC (History)
11 users (show)

See Also:
Web browser: ---
Mobile Platform: ---
Assignee Huggle Beta Tester: ---


Attachments

Description Michael Dale 2010-10-27 19:31:40 UTC
Bug 17255 originally tracked the upload in chunks which was closed when it was deployed, but then disabled later on. It has since been refactored into the FirefoggChunkedUpload extension.

Upload by chunks is still in the mediaWiki manual on the upload api: http://www.mediawiki.org/wiki/API:Upload#Chunked_upload

And is supported in the firefogg extension, a native javascript implementation could potentially be added to the Upload Wizard, but that is a separate bug, and having it supported on the server is a prerequisite to clients support. 

If we want to make any changes to the chunk uploading protocol we should make thous change requests known now.
Comment 1 Sam Reed (reedy) 2010-12-30 17:21:09 UTC
Deleted the chunked stuff in the MW.org manual for the core API, it's not there anymore..

http://www.mediawiki.org/w/index.php?title=API%3AUpload&action=historysubmit&diff=372806&oldid=352817
Comment 2 Sam Reed (reedy) 2011-01-04 18:36:44 UTC
This bug is categorised wrongly, though, I'm struggling to see where it should actually be put.

Suggestions?
Comment 3 Roan Kattouw 2011-03-05 19:36:42 UTC
IIRC chunked uploads is in core but disabled, or was in core at some point but was taken out.

It had a security issue, which nobody dealt with for like a year even though it was fairly easy to fix AFAICT (security checks run on individual chunks but not on the reassembled file).

Another thing I'd like to note is that some browsers support the blob API (part of the HTML5 file API) nowadays, which means chunked upload can be implemented in JavaScript instead of having to rely on a browser plugin like Firefogg :)
Comment 4 Michael Dale 2011-03-13 06:43:58 UTC
I think the best approach would be to just make this part of the upload wizard itself. This way the chunked support of the upload wizard is not a separate extensions and it can easily be enabled or disabled in the upload wizard config.
Comment 5 Tim Starling 2011-03-24 03:29:46 UTC
I've said before that I'd prefer a pure-JavaScript solution. I don't want to encourage users to install buggy and potentially insecure Firefox extensions in the course of their regular web browsing. 

I've also said that if you were to support chunked uploads independent of Firefogg, you wouldn't want to do it with the Firefogg protocol, since it does not provide a way to detect duplicate, missing or reordered chunks, nor does it provide a way for the server to report an error message to the user on chunk upload. In fact, the Firefogg client will keep retrying the chunk upload if you send it anything other than a "success" message. So enabling this extension is not a prerequisite for pure-JS chunked upload support.

A quick check of the relevant source suggests that I don't have to reconsider my position on any of these things. 

Suggest WONTFIX.
Comment 6 Chad H. 2011-03-24 03:36:39 UTC
(In reply to comment #5)
> I've said before that I'd prefer a pure-JavaScript solution. I don't want to
> encourage users to install buggy and potentially insecure Firefox extensions in
> the course of their regular web browsing. 
> 

I'd second the call for WONTFIX on this point alone, even if there weren't other technical objections (which there are).
Comment 7 MZMcBride 2011-03-24 04:17:27 UTC
(In reply to comment #0)
> And is supported in the firefogg extension, a native javascript implementation
> could potentially be added to the Upload Wizard, but that is a separate bug,
> and having it supported on the server is a prerequisite to clients support. 

I'm not sure if the "separate bug" you mentioned is already filed, but if not, please create one.

(In reply to comment #5)
> I've said before that I'd prefer a pure-JavaScript solution. I don't want to
> encourage users to install buggy and potentially insecure Firefox extensions in
> the course of their regular web browsing. 

Agreed. Advancements in HTML5 and JavaScript support in modern browsers make a browser-specific solution here seem inappropriate.

(In reply to suggestions for "wontfix" in comment #5 and comment #6)

Resolved as such.
Comment 8 Michael Dale 2011-03-24 09:31:43 UTC
As previously mentioned [1] the reported CSRF was fixed within hours of it being reported. If there is an outstanding actual issue please do file the bug or please do highlight something other than what has already been fixed ages ago.  

Proposals to address protocol concerns have been made: ie numbered chunks in request and response, and a 5 min time out. These propsals were made directly to Tim Starling one year ago on 03/18/2010 by the firefogg author, which revived no reply. 

Then again revisited by Mark Hershbergs work to re-factor the chunk support on 04/08/2010 again with request for private comment with no response. 

Then again publicly on the mailing list Tim suggested a 3rd party chunk protocol that was practically identical to the firefogg protocol, leaving a hanging inquiry by Neil "What in your view would a better version look like?
The PLupload protocol seems quite similar. I might be missing some 
subtle difference." [2]

If there is an actual proposal ( i.e not a impossible to disprove negative ), please please please do make thous comments, bug or proposal known. I don't see how repeating the phrase "buggy and potentially insecure" is constructive. 

It is true chunk uploading can be implemented in native browser XHR which is fine and good. But this still requires a "chunks" support extension similar to the extension associated with this bug request. A distinction should be made between the browser ad-on and the server side chunk support. This bug refers to the server side chunk support which has little to do with browser ad-on.

We can rename the extension to "chunked upload" since clearly some people may have some aversions to the word "firefogg"?

And of course native browser XHR will not support transcoding proprietary video files into free codecs, and for cases where the user wants to upload a video clip in a non-free format recommending firefogg is perfectly reasonable, and not dependent on chunk uploading being enabled at all, since firefogg can just as well send the file as via a single file POST request. 

Independently of whether the firefogg software is "total crap" or not ... Firefogg is a widely used Firefox ad-on for converting media files in browser, many commons users are already using it via the firefogg.org/make page for their video conversions, its recommended on commons, used in gadgets ( in POST mode ), mentioned and recommended by many 3rd party sites, has thousands of installs etc. So ... if there are actual issue we would do good to report them. 

Tim is correct that there has not been much followup on this chunk effort as people have been busy. But I don't think that a valid reason to classify as "WONTFIX", the issues that have been raised have been addressed and some integration work remains to be done. If there are other issues to be addressed they should be made known.

We should not have the feature set dismissed with broad impossible to disprove negative generalizations without any course of action for those that see value in this feature set. 

If some practical alternative for large file uploads is to be proposed, then it should be proposed rather than dismissing efforts to address the issue without recourse for years at a time. 

[1] http://www.mail-archive.com/wikitech-l@lists.wikimedia.org/msg08219.html
[2] http://www.mail-archive.com/wikitech-l@lists.wikimedia.org/msg08199.html
Comment 9 Roan Kattouw 2011-03-27 13:11:33 UTC
Reopening since Tim's arguments for WONTFIX pertained mostly to the Firefogg add-on (client-side) rather than the FirefoggChunkedUpload extension (server-side support).

I would prefer chunked upload support to be in core. IIRC the reason it was disabled before was that security checks were only done on the individual chunks and not on the reassembled file, but that should be easy to fix. Client-side implementations of chunked uploading (I'd like to see something based on the HTML5 Blob API in UploadWizard) are a separate issue.
Comment 10 Michael Dale 2011-03-27 21:24:29 UTC
(In reply to comment #9)
> I would prefer chunked upload support to be in core. IIRC the reason it was
> disabled before was that security checks were only done on the individual
> chunks and not on the reassembled file, but that should be easy to fix.

That vulnerability was only in the extension not in the 'core' version we had deployed. ie line 82 called the parent verifyUpload on the entire upload once we received the last chunk [1]... it was used as the rational for disabling it for who knows why ... but that is really water under a bridge with a dead horse on top 

...

Happy to add the native html5 blob client side chunk uploading support to upload wizard if we can get any movement on the server side support.  

[1] http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/upload/UploadFromChunks.php?revision=57732&view=markup&pathrev=59400
Comment 11 Tim Starling 2011-03-27 23:35:58 UTC
(In reply to comment #9)
> Reopening since Tim's arguments for WONTFIX pertained mostly to the Firefogg
> add-on (client-side) rather than the FirefoggChunkedUpload extension
> (server-side support).

Actually I think the second paragraph in comment 5, where I explained why I don't think the server-side extension should be enabled, was longer than the first paragraph, which dealt with my objections to the client-side.
Comment 12 Neil Kandalgaonkar 2011-03-28 03:13:19 UTC
(In reply to comment #11)
> (In reply to comment #9)
> > Reopening since Tim's arguments for WONTFIX pertained mostly to the Firefogg
> > add-on (client-side) rather than the FirefoggChunkedUpload extension
> > (server-side support).
> 
> Actually I think the second paragraph in comment 5, where I explained why I
> don't think the server-side extension should be enabled, was longer than the
> first paragraph, which dealt with my objections to the client-side.

I've had a look at Google's Resumable Upload Protocol. They do things in a reasonable manner, also very RESTy. We have never used HTTP Headers or Status fields for application state signalling, but we can emulate most of this in ordinary POST parameters and returned data.

http://code.google.com/apis/documents/docs/3.0/developers_guide_protocol.html#ResumableUpload

Okay, so if the following things were added to a chunked upload protocol, would this be satisfactory?

* Before starting an upload, the client tells the server the length of the file to be uploaded, in bytes.

* With each upload chunk, the client also tells the server which range of bytes this corresponds to.

* With each response to an upload chunk, the server indicates the largest range of contiguous bytes starting from zero that it thinks it has. (The client should use this information to set its filepointer for subsequent chunks). n.b. this means it's possible for the client to send overlapping ranges; the server needs to be smart about this

* The server is the party that decides when the upload is done. (By signaling that the full range of packets are received, saying "ok, all done", and then returning the other usual information about how to refer to the reassembled file).

We could also add optional checksums here, at least for each individual chunk. (A complete-file checksum would be nice, but it's not clear to me whether it is practical for Javascript FileAPI clients to obtain them).

And each upload could return some error status, particularly if checksums or expected length doesn't match.
Comment 13 Neil Kandalgaonkar 2011-03-28 03:20:52 UTC
Okay, from some comments on IRC, it is apparently unclear why I just posted some suggestions for changing the Firefogg protocol. I am trying to answer Tim's objections that the way of uploading chunks is not robust enough.

In Firefogg, the client is basically POSTing chunks to append until it says that it's done. The server has no idea when this process is going to end, and has no idea if it missed any chunks. I believe this is what bothered Tim about this. 

There was some confusion about whether PLupload was any better. As far as I can tell, it isn't, so I looked to the Google Resumable Upload Protocol for something more explicit.
Comment 14 Tim Starling 2011-03-28 03:35:46 UTC
(In reply to comment #12)
> Okay, so if the following things were added to a chunked upload protocol, would
> this be satisfactory?

Yes, that is the sort of thing we need.

(In reply to comment #13)
> In Firefogg, the client is basically POSTing chunks to append until it says
> that it's done. The server has no idea when this process is going to end, and
> has no idea if it missed any chunks. I believe this is what bothered Tim about
> this. 

Yes. For example, if the PHP process for a chunk upload takes a long time, the Squid server may time out and return an error message, but the PHP process will continue and the chunk may still be appended to the file eventually. In this situation, Firefogg would retry the upload of the same chunk, resulting in it being appended twice. Because the original request and the retry will operate concurrently, it's possible to hit NFS concurrency issues, with the duplicate chunks partially overwriting each other.

A robust protocol, which assumes that chunks may be uploaded concurrently, duplicated or omitted, will be immune to these kinds of operational details.

Dealing with concurrency might be as simple as returning an error message if another process is operating on the same file. I'm not saying there is a need for something complex there.
Comment 15 MZMcBride 2011-03-28 03:51:39 UTC
This bug is about the deployment of a specific extension to Wikimedia wikis. The last few comments don't really relate to extension deployment, they relate to the underlying issue, which is covered in bug 17255 (now re-opened). I'm going to copy over the last few comments from this bug to bug 17255. This bug should probably be re-resolved, but I'll leave that for someone else.
Comment 16 Neil Kandalgaonkar 2011-03-28 04:12:40 UTC
I think(In reply to comment #15)
> This bug is about the deployment of a specific extension to Wikimedia wikis.
> The last few comments don't really relate to extension deployment, they relate
> to the underlying issue, which is covered in bug 17255 (now re-opened). I'm
> going to copy over the last few comments from this bug to bug 17255. This bug
> should probably be re-resolved, but I'll leave that for someone else.

I think MZ is right. Firefogg's protocol (as written) not actually give us upload resumability, since the server can't say what its state is. So if the real goal was resumability, bug 17255 is not fixed yet.

In practice, I think enabling Firefogg is at least useful (if not robust to the kinds of errors Tim has mentioned) and has the advantage that it can be implemented relatively simply on the server side. The worst case scenario would be that an uploaded file would be corrupt, but that would be easily detected in most cases.

Is it possible to pursue both goals? We are only talking about an extension, after all. 

We can enable Firefogg today, and think about implementing a better protocol in core very soon after, and then try to convince Firefogg to support that? As a Firefox extension, we can expect Firefogg to auto-update, so we won't have to support their protocol forever.
Comment 17 Chad H. 2011-03-28 06:10:25 UTC
(In reply to comment #16)
> We can enable Firefogg today, and think about implementing a better protocol in
> core very soon after, and then try to convince Firefogg to support that? As a
> Firefox extension, we can expect Firefogg to auto-update, so we won't have to
> support their protocol forever.

I was pretty sure from comment 5, 6 and 7 that asking users to install an extension is a non-starter.
Comment 18 Michael Dale 2011-03-28 16:43:52 UTC
(In reply to comment #16)

I think it would be ideal to have a single chunk uploading protocol. Firefogg can use whatever protocol is supported by mediawiki ( ie its being used with just a big POST request right now ) 

Because modern browsers have the html5 blob api, chunking would ( of course ) be better supported natively.  Firefogg is a very useful to a niche set of uploaders for video conversions. ( not for a general user who is uploading large images )

In terms of the upload protocol. ( comment #11 ) The sort of ranged POST request model that google is promoting seems perfectly reasonable. It was proposed a while back, but was discouraged because it seemed to go too low level, But now that is been out for sometime and if there is relative agreement, I suggest we just ~do~ that!

Errors can be handled normally via XHR response object headers ie a 500 response with error details being sent in our normal json response API. Success chunks respond with partial data revived i.e HTTP/1.1 308 Resume Incomplete  headers which also set the byte range the next chunk. 

Checksums for individual chunks may be overkill, there are lower level protocol guards against this, it makes it more challenging to implement a client and would break compatibility with some future native google #ResumableUpload client or library. 

Guards against concurrency could be as simple as never writing to file ranges that you have already received and only accepting subsequent chunks in a given sequence of byte ranges.
Comment 19 Neil Kandalgaonkar 2011-03-28 17:48:19 UTC
(In reply to comment #18)

I agree with most everything Michael wrote, except:

> Success
> chunks respond with partial data revived i.e HTTP/1.1 308 Resume Incomplete 
> headers which also set the byte range the next chunk. 

While I personally believe that using HTTP Headers and statuses like this is The Right Thing, this tends not to work in APIs that are designed for the masses. And we would like this API to be reimplemented by others.

It is difficult to underestimate the level of incompetence out there when it comes to the web. Your average developer has no idea how to write a library that traps a 308 status. Generally people prefer a RPC-like model, with both success and error statuses encoded into 200 OK responses. :( 

And, that's already what the rest of the MediaWiki API does. Much like Flickr's similar RPC-oriented API is widely implemented, despite many flaws. Because it is that simple, people can dash off a client in an afternoon.
 
> Checksums for individual chunks may be overkill

Okay.
Comment 20 Michael Dale 2011-03-28 18:18:56 UTC
(In reply to comment #19)
> It is difficult to underestimate the level of incompetence out there when it
> comes to the web. Your average developer has no idea how to write a library
> that traps a 308 status. Generally people prefer a RPC-like model, with both
> success and error statuses encoded into 200 OK responses. :( 
> 
> And, that's already what the rest of the MediaWiki API does

yea... when we mentioned it a while back https://bugzilla.wikimedia.org/show_bug.cgi?id=17255#c3 we sort of reached a similar conclusion that HTTP protocol adaptations would be difficult. 

But ... the idea with protocol modifications is that is part of a more general effort to do something in a general purpose way that is more or less consistent across APIs. i.e it separates the 'transport handling' from the 'api query and response'. So the same transport code supports chunk uploads in the same way regardless if your uploading to youtube or wikimedia commons with javascript XHR, native browser upload handler or with a python script with a chunk upload handler shared library. 

There are tradeoffs either way, someone simply needs to make a decision.

As long as there are no foreseen issues with supporting this protocol in wikimedias cluster setup ( ie we can pass along non-standard http headers, the system does not blow up with 308 response codes etc )  I would lean toward separating the transport layer into a more common shared protocol than something custom to mediawiki.
Comment 21 Roan Kattouw 2011-03-29 08:41:38 UTC
(In reply to comment #15)
> This bug is about the deployment of a specific extension to Wikimedia wikis.
> The last few comments don't really relate to extension deployment, they relate
> to the underlying issue, which is covered in bug 17255 (now re-opened). I'm
> going to copy over the last few comments from this bug to bug 17255. This bug
> should probably be re-resolved, but I'll leave that for someone else.
I'm re-closing it now. There seems to be a consensus on the fact that FirefoggChunkedUpload won't be enabled in its current state (Neil suggested this, but Michael said he'd rather have One True Protocol for MediaWiki and have Firefogg follow suit) and on the fact that we want to implement a protocol inspired on Google's. Any further discussion about that is out of place here IMO, and should happen on bug 17255.

Note You need to log in before you can comment on or make changes to this bug.


Navigation
Links