question about the data structure of a serverless endpoint

I have a question about the data structure of a serverless endpoint. I need to build a container with more than 1 model on it, it'll have a network volume with all data. The question is: Where should I store the virtual environments, package dependencies (pyenv and pipenv) and caches ? Storing them on the docker image or the network volume? which will bring better results in terms of performance and execution time ?
11 Replies
justin
justin5mo ago
docker image Network storage is essentially an external hard drive will always be slower pulling from a diff storage container than just pulling it directly from local resources
moonlight
moonlight5mo ago
Okay, and how much is the impact (approx) of starting a 10gb image in comparison of a 1gb image? (loading both the same packages during startup)
flash-singh
flash-singh5mo ago
no difference, its the not image size but what your startup is doing, if your startup loads 10G model into vram vs 100G vram, that has a bigger impact image size has more to do with initialization of downloading the image
justin
justin5mo ago
Startup pulling a model into vram as flash said is the biggest impact. from a network storage will be significantly slower tho than locally on the image. And initialization time is one time for when workers are first created, and persist for future requests Do u have an idea of what ur trying to build?
moonlight
moonlight5mo ago
This happens just with the first request, right? After that a cold-start will not download again the image?
justin
justin5mo ago
Cold start and initialization is diff Initialization is when runpod downloads ur image to runpod and then it saves to the worker for future startups Startups have (cold start times) where the worker is going from nothing to something Cold start times can vary on different factors > and then finally before execution time there is also bit of setup time in ur execution such as model = load(model) if ur load(model) is huge will take it a bit but if u set ur worker to idle such as for 2 mins after the worker is active, it can sit with an already loaded model in memory and just keep pulling requests
moonlight
moonlight5mo ago
Thank you for the information The plan is to put SDXL + Vision Language Model and eventually some other smaller model, running them on a chain. I estimate about 36gb vram, I will need to start, do the work and inmediatly stop, so i'll need to load to vram on each request. The first model to run will be the sdxl so maybe it's possible to fit it in the docker image and the rest on the network volume, what do you think about?
flash-singh
flash-singh5mo ago
are you using 48gb gpu? if all models are static, i would put them all in docker image unless it will be too big and go over like 50gb
moonlight
moonlight5mo ago
As static you mean they will not be replaced frequently, right? yes, they will be static, and yes i'm thinking on a 48gb gpu. At the moment i'm building the project and your help is very useful to setup everything on the proper way
flash-singh
flash-singh5mo ago
i would put everything in docker image, that will also help with scale
moonlight
moonlight5mo ago
Thank you very much!