Hello,
I am not sure how does it work exactly.
So I have a few questions.
I want to use the serveless service of runpod. If I correctly understood, a worker is waiting for an API call and I am going to pay for the time it needs to respond.
For the first time (at the moment the worker wakes up), I am going to pay more because there is a delay time (in order to set up the docker image) ?
Then until it goes idle, the setup is done ?
So the strat is having one active worker ?
Moreover, how should I handle the fact that I am using multiple big models.
Like is there a difference between put the model in the docket image or, pulling it in the script with a side function ?
Is it better to use a network volume ? Because I’ve seen that there is a lag when trying to get the data from a network volume. Moreover, since a network volume can loose its gpus, is there a rapid way to transfer models from a network volume in a specific region to another ?
Thanks for your help