RunpodR
Runpod7mo ago
Foopop

serverless does not cache at all

So becuase the serverless vllm worker did not have a thing I needed I changed it a bit and uplaoded my own docker image of it.

But now after each request it has to load the model compeltly again and it takes 90 seconds each time. Like I do a request the worker load 90s does the requst then goes offline again after the 5s timeout i set and then I send another request and it has to do the 90s loading again.


Like it does all that each time. I use a vllm endpoint on serrverless wiht a differnt model and it does not load this long, in fact its below 1 second even after the workers went offline again after the timeout. Why is that, im already using network storage for the model.


Here is the logs of what has to happen on each request which takes 90 seconds.
Was this page helpful?