Runpod•14mo ago

6x speed reduction with network storage in serverless

To reduce my docker image size I wanted to use the network storage to store the models, but the main issue I am running against now is that I went from 20sec per request to 120sec. When looking at the logs, it takes almost 100sec (vs a few sec) to load the model in GPU memory. Why is the network storage so slow ??? its a major drawback and means you and I have to handle 10s of Gb of Docker image for nothing.

8 Replies

Encyrption•14mo ago

This is a known issue with network volume. @flash-singh has recently reported that a new service will be coming to RunPod soon to address this. It is a model cache where you can pull models from Huggingface and not embed them in your container image, RunPod will automatically inject the model into your worker, using the local NVME disk. In the mean time you will likely be better off embedding your models directly into your image.

XquaOP•14mo ago

Cool ! Would love to know when this is the case !

neural-soupe•7mo ago

@Encyrption @flash-singh has there been any update on this feature?

Unknown User•7mo ago

Message Not Public

yhlong00000•7mo ago

This feature development is currently on hold. Will let you know when we resume it.

flash-singh•2mo ago

mode store / cache is in development and planned for early Q4, it will allow you to avoid putting models in container image, instead runpod will support the model injection using local nvme storage by exposing readonly volume, this will help provide better performance than network volumes

Unknown User•2mo ago

Message Not Public

flash-singh•2mo ago

if your asking about how to use a model, the feature will use env variables to let you define the model and token for huggingface feature is planned where you can upload model as well through console, not 100% if it will go live with above or come later in Q4 runpod itself will still use network storage for caching and network for downloading the models when not in cache, the actual model from your perspective will be mounted from local nvme disk to your container

Gaming

Programming

6x speed reduction with network storage in serverless

Did you find this page helpful?