6x speed reduction with network storage in serverless
To reduce my docker image size I wanted to use the network storage to store the models, but the main issue I am running against now is that I went from 20sec per request to 120sec.
When looking at the logs, it takes almost 100sec (vs a few sec) to load the model in GPU memory.
Why is the network storage so slow ??? its a major drawback and means you and I have to handle 10s of Gb of Docker image for nothing.
8 Replies
This is a known issue with network volume. @flash-singh has recently reported that a new service will be coming to RunPod soon to address this. It is a model cache where you can pull models from Huggingface and not embed them in your container image, RunPod will automatically inject the model into your worker, using the local NVME disk. In the mean time you will likely be better off embedding your models directly into your image.
Cool ! Would love to know when this is the case !
@Encyrption @flash-singh has there been any update on this feature?
Unknown User•7mo ago
Message Not Public
Sign In & Join Server To View
This feature development is currently on hold. Will let you know when we resume it.
mode store / cache is in development and planned for early Q4, it will allow you to avoid putting models in container image, instead runpod will support the model injection using local nvme storage by exposing readonly volume, this will help provide better performance than network volumes
Unknown User•2mo ago
Message Not Public
Sign In & Join Server To View
if your asking about how to use a model, the feature will use env variables to let you define the model and token for huggingface
feature is planned where you can upload model as well through console, not 100% if it will go live with above or come later in Q4
runpod itself will still use network storage for caching and network for downloading the models when not in cache, the actual model from your perspective will be mounted from local nvme disk to your container