Pre-cached model selection doesn't appear to existing when creating a new serverless endpoint
The docs (https://docs.runpod.io/serverless/endpoints/manage-endpoints) say:
"""
Model (optional): Select a model from Hugging Face to optimize worker startup times. When you specify a model, Runpod attempts to place your workers on host machines that already have the model cached locally, resulting in faster cold starts and cost savings (since you won’t be charged while the model is downloading). You can either select from the dropdown list of pre-cached models or enter a custom Hugging Face model URL.
"""
.. however I don't see a dropdown of "pre-cached" models for that input when, for example, selecting vLLM via the https://www.console.runpod.io/serverless/new-endpoint
Am I missing something here? 🤔
Runpod Documentation
Manage Serverless endpoints - Runpod Documentation
2 Replies
Thank you for bringing this feedback to our attention. As this feature is currently in beta, we are making changes to it. At the moment, we haven't enabled pre-cached models for Model Store. We'll update the docs to avoid the confusion. We will soon be pre-caching smaller, most-used models, but I don't have an ETA for that yet. I will keep you updated.
Understood, thanks for the quick reply 👍