Général advices on the pricing and the use of server less

Hello, I am not sure how does it work exactly. So I have a few questions. I want to use the serveless service of runpod. If I correctly understood, a worker is waiting for an API call and I am going to pay for the time it needs to respond. For the first time (at the moment the worker wakes up), I am going to pay more because there is a delay time (in order to set up the docker image) ? Then until it goes idle, the setup is done ? So the strat is having one active worker ? Moreover, how should I handle the fact that I am using multiple big models. Like is there a difference between put the model in the docket image or, pulling it in the script with a side function ? Is it better to use a network volume ? Because I’ve seen that there is a lag when trying to get the data from a network volume. Moreover, since a network volume can loose its gpus, is there a rapid way to transfer models from a network volume in a specific region to another ? Thanks for your help
1 Reply
flash-singh
flash-singh6mo ago
best to have the model in container image, it loads faster from nvme than network storage, you can use active if you want to eliminate cold start for first few requests or scale to zero