R
Runpod14mo ago
Xqua

6x speed reduction with network storage in serverless

To reduce my docker image size I wanted to use the network storage to store the models, but the main issue I am running against now is that I went from 20sec per request to 120sec. When looking at the logs, it takes almost 100sec (vs a few sec) to load the model in GPU memory. Why is the network storage so slow ??? its a major drawback and means you and I have to handle 10s of Gb of Docker image for nothing.
8 Replies
Encyrption
Encyrption14mo ago
This is a known issue with network volume. @flash-singh has recently reported that a new service will be coming to RunPod soon to address this. It is a model cache where you can pull models from Huggingface and not embed them in your container image, RunPod will automatically inject the model into your worker, using the local NVME disk. In the mean time you will likely be better off embedding your models directly into your image.
Xqua
XquaOP14mo ago
Cool ! Would love to know when this is the case !
neural-soupe
neural-soupe7mo ago
@Encyrption @flash-singh has there been any update on this feature?
Unknown User
Unknown User7mo ago
Message Not Public
Sign In & Join Server To View
yhlong00000
yhlong000007mo ago
This feature development is currently on hold. Will let you know when we resume it.
flash-singh
flash-singh2mo ago
mode store / cache is in development and planned for early Q4, it will allow you to avoid putting models in container image, instead runpod will support the model injection using local nvme storage by exposing readonly volume, this will help provide better performance than network volumes
Unknown User
Unknown User2mo ago
Message Not Public
Sign In & Join Server To View
flash-singh
flash-singh2mo ago
if your asking about how to use a model, the feature will use env variables to let you define the model and token for huggingface feature is planned where you can upload model as well through console, not 100% if it will go live with above or come later in Q4 runpod itself will still use network storage for caching and network for downloading the models when not in cache, the actual model from your perspective will be mounted from local nvme disk to your container

Did you find this page helpful?