Multipart uploads
Hi, I did a search but was not able to find a proper answer. What is the proper programmatic way to upload large (multi gig) files? I need large video calls uploaded to R2 for AI purposes. I am guessing that if I use Workers, they will timeout (right?)
15 Replies
Generally you’d use a multipart upload for large files. What this means is that you’d upload your file in small parts, and then ask R2 to stitch them all into one object once all parts have been uploaded. There are tools/libraries to help you do this.
I’d recommend reading up a little on the terminology of multipart uploads just to get familiar with them (feel free to ask questions here if you get stuck)
If you decide to use a Worker, the bindings API expose methods that will let you work with multipart uploads. This example shows you how you can build an HTTP API that uses multipart uploads: https://developers.cloudflare.com/r2/api/workers/workers-multipart-usage/
If you’d rather work with the S3 API, you’ll find documentation about S3’s implementation on their site. Look for the actions CreateMultipartUpload, UploadPart, CompleteMultipartUpload & AbortMultipartUpload
Use the R2 multipart API from Workers · Cloudflare R2 docs
By following this guide, you will create a Worker through which your applications can perform multipart uploads.
This example worker could serve as a …
Workers are time-bound on CPU time, so if you can stream your upload, it will not cost you anything regardless of the file size.
A multipart upload is generally a better idea though because it protects you from network failures. Each part can be individually retried, so you can do “resumable” uploads that way. If a network failure occurs, a single part fails instead of the entire large file going kaput.
Sorry if I assumed that you didn’t know about multipart uploads, it wasn’t clear to me from your question.
Thank you for the detailed answer. I do know what multi part uploads are but from what I read about multipart uploads as it relates to R2, it's not just a traditional multi-part upload. I would have to implement something special to keep track of the parts on the client side right? I want to be able to have a generic endpoint where any file upload can call and just upload a file.
Can I still use a multipart upload for that?
I am curious about this statement: Workers are time-bound on CPU time, so if you can stream your upload, it will not cost you anything regardless of the file size.
Can you point me to some literature about this?
I’m not sure what you mean by a “traditional” multipart upload, but in R2, your client would need to hold on to an upload id and send that over with each part so we know which upload you’re adding parts to.
For your generic API, are you thinking of uploading the large file in one singular HTTP call?
Does this make things clearer? https://developers.cloudflare.com/workers/platform/limits/#worker-limits
Basically, there’s CPU Time, and there’s Duration. There’s a limit of 30s on the CPU time (on the Unbound plan), but the duration is unlimited (but as you got higher and higher above 30s, the chance of getting evicted increases. This exists to weed out malicious users).
Limits · Cloudflare Workers docs
You can request adjustments to limits that conflict with your project goals by contacting Cloudflare. To request an increase to a limit, complete the …
So let's assume that I don't do multi-part (I understand the implications of bad connections and all that but just trying to understand something) and I use workers. I am trying to understand the "Duration". If I upload a 10 GB file, and it takes 20 mins to upload, will the whole upload go through (assuming connection remained stable)?
AFAIK yes, it should. I would ask again on the #workers-help channel, because I'm not a runtime expert. I can ask internally for someone to respond to you just so it gets clarified
thanks
Yeah just link me to it when you do please!
@sdnts I linked you in the post. I think the 500MB body limit might be an issue although I am not sure. I guess back to the original (real) ask: How do I get a large file into R2 (programmatically)?
is the S3 API the only real option?
Well that 500MB limit really only means that each part is going to be limited to that size. I'll ask around about this, it almost feels counter-productive, especially with Workers that use R2.
But anyway, for now it sounds like the S3 API is your only option. \
Are you going to be uploading from a browser?
yes from a browser
Right, so if you say you do not want to complicate the client with multipart uploads, then a presigned URL seems like the way to go
You could generate these URLs from a Worker (or from any other backend really), and send them over to the browser to PUT to
Ok. That sounds promising.
I'll look into that. Thanks.
Would it be possible to use tus to do this?
R2 on its own doesn’t support TUS, but if you want to write a proxy that sits between your client and R2, that could work (I’m not too familiar with TUS, so I can’t comment on the feasibility of this)