HF Cache
Popular Hugging Face models have super fast cold-start times now
We know lots of our developers love working with Hugging Face models. So we decided to cache them on our GPU servers and network volumes.
Popular Hugging Face models have super fast cold-start times now
We know lots of our developers love working with Hugging Face models. So we decided to cache them on our GPU servers and network volumes.
GPU Availability Issue on RunPod – Need Assistance

job timed out after 1 retries
Unable to fetch docker images
error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": context deadline exceeded
2024-11-18T18:10:47Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
...Failed to get job. - 404 Not Found
vLLM override open ai served model name
OPENAI_SERVED_MODEL_NAME_OVERRIDE
but the name of the model on the openai endpoint is still hf_repo/model name.
The logs show : engine.py: AsyncEngineArgs(model='hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4', served_model_name=None...
and the endpoint returns Error with model object='error' message='The model 'model_name' does not exist.' type='NotFoundError' param=None code=404
...Not using cached worker

What are ttft times we should be able to reach?
80GB GPUs totally unavailable
Not able to connect to the local test API server
What methods can I use to reduce cold start times and decrease latency for serverless functions
Network volume vs baking in model into docker
Jobs Stays in In-Progress for forever
How to Get the Progress of the Processing job in serverless ?

Why is Runsync returning status response instead of just waiting for image response?
Worker Keeps running after idle timeout
May I deploy template ComfyUI with Flux.1 dev one-click to serverless ?emplate

What is the real Serverless price?