Question Regarding Storing Videos
Hi everyone,
Let’s say I have a webapp that takes a video input, converts it to audio and just sends that audio to an api to get a transcript. I was wondering whether I should be using S3 storage or a similar approach for such a use case. Thing is I don’t need the video file beyond getting the transcription, and there aren’t any regulatory things in which I do need to keep the video files. So it does feel like storing it in a S3 might be overkill, since we simply delete the video afterwards. I was wondering what the best/most efficient way to tackle this problem would be.
I apologize if this is a stupid question, I’ve just started creating webapps and want to learn about general design patterns and architecture/approaches.
Thanks
4 Replies
This heavily depends on your requirements and subsequent design.
Say you go with somehting like a worker pool. The user uploads a video and job is created and at some point in future a worker will pick it up, process the video and spit out results.
You need to keep the video somewhere while it is in queue - S3 might be good for that.
Another way that comes to my mind is forcing the user to wait until a worker is allocated for them. When it is, they get an address to upload the video to, directly to the worker. It may be simpler to implement and you don't need to store the video at all (you could convert it as it comes in), but your workers will be underutilized (they will be waiting a lot for users to upload data) and any errors mean that the user has to re-upload the file.
There are probably a lot of other options, I only gave this like 1 minute of thought.
I would say use something like ffmpeg it can extract the audio in the browser from video files. and then you can send the audio file directly to storage like s3 or r2.
definitely was thinking about these sorts of apporaches, S3 would be great, but i am worried about the budgeting aspect, in the sense that since we are deleting videos right after FFMPEG conversion to audio, some providers do charge for 90 days storage regardless. I also want to keep the costs as low as possibly because transcription can be expensive at 9 cents an hour
this is not a terrible idea either, if I am using an express backend, would it still be the same approach
Yes that would be perfectly fine