RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Why when I try to post it already tags it Solved?

Why when I try to post it already tags it Solved?

HF Cache

Hey I got this email from you guys
Popular Hugging Face models have super fast cold-start times now
We know lots of our developers love working with Hugging Face models. So we decided to cache them on our GPU servers and network volumes.
Popular Hugging Face models have super fast cold-start times now
We know lots of our developers love working with Hugging Face models. So we decided to cache them on our GPU servers and network volumes.
...

GPU Availability Issue on RunPod – Need Assistance

Hi everyone, I’m currently facing an issue with GPU availability for my ComfyUI endpoint (id: kw9mnv7sw8wecj) on RunPod. When trying to configure the worker, all GPU options show as “Unavailable”, including 16GB, 24GB, 48GB, and 80GB configurations (as shown in the attached screenshot). This is significantly impacting my workflow and the ability to deliver results to my clients since I rely on timely image generation....
No description

job timed out after 1 retries

Been seeing this a ton on my endpoint today resulting in being unable to return images. response_text: "{"delayTime":33917,"error":"job timed out after 1 retries","executionTime":31381,"id":"sync-80dbbd6d-309c-491f-a5d0-2bd79df9c386-e1","retries":1,"status":"FAILED","workerId":"a42ftdfxrn1zhx"} ...

Unable to fetch docker images

During worker initialization I am seeing errors such as: error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": context deadline exceeded 2024-11-18T18:10:47Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)...

Failed to get job. - 404 Not Found

the endpoint is receiving the jobs but errors out (worker logs below): ``` 2024-11-18T13:50:42.510726100Z {"requestId": null, "message": "Failed to get job. | Error Type: ClientResponseError | Error Message: 404, message='Not Found', url='https://api.runpod.ai/v2/ihv956xmtmq9t3/job-take/etbm9mpkgsl6hd?gpu=NVIDIA+GeForce+RTX+3090&job_in_progress=0'", "level": "ERROR"} 2024-11-18T13:50:42.848129909Z {"requestId": null, "message": "Failed to get job. | Error Type: ClientResponseError | Error Message: 404, message='Not Found', url='https://api.runpod.ai/v2/ihv956xmtmq9t3/job-take/etbm9mpkgsl6hd?gpu=NVIDIA+GeForce+RTX+3090&job_in_progress=0'", "level": "ERROR"}...

vLLM override open ai served model name

Overriding the served model name on the vllm serverless pod doesn't seem to take effect. Configuring a new endpoint through the explore page on runpod's interface creates a worker with the env variable OPENAI_SERVED_MODEL_NAME_OVERRIDE but the name of the model on the openai endpoint is still hf_repo/model name. The logs show : engine.py: AsyncEngineArgs(model='hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4', served_model_name=None... and the endpoint returns Error with model object='error' message='The model 'model_name' does not exist.' type='NotFoundError' param=None code=404 ...

Not using cached worker

I've been running into this problem for several days now. I have a endpoint that runs a forge webui worker with a network volume attached. And as you know forge takes some time to start and only then generates the image. So generally when I send a request to a worker it takes some delay for the start process then generates images. But recently I've run into an issue where there is already a worker running with webui forge started and ready to accept requests but when I submit a new request it completely starts a new worker, which results in huge delay times. My question is, why isn't it using the already available worker which has forge loaded? And no, the requests weren't submitted one after the other so there is no reason to start a new worker...
No description

What are ttft times we should be able to reach?

Of course this depends on token inputs, hardware selection etc. But for the life of me, I cannot get a TTFT of under 2000 ms on serverless. I'm using llama 3.1 7b / gemma / mystral on 48 GB gpu workers. For performance evaluation I use guidellm which test for different throughput (continous, small, large) scenarios. Even with 50 input tokens and 100 output tokens I see 2000-2500 ms ttft. ...

80GB GPUs totally unavailable

My app is totally down because there isn't even 1 GPU available. This has never happened before. Is it me?...

Not able to connect to the local test API server

I am running the container on an EC2 instance. I keep getting errors like: ``` Error handling request...

What methods can I use to reduce cold start times and decrease latency for serverless functions

I understand that adding active workers can reduce cold start issues, but it tends to be costly. I’m looking for a solution that strikes a balance between minimizing cold start times and managing costs. Since users only use our product during limited periods, keeping workers awake all the time isn’t necessary. I’d like to know about possible methods to achieve this balance.

Network volume vs baking in model into docker

I want to run a serverless worker that can get called anywhere from once per hour to 300-400/hour. I want to optimize for cold starts when the occasional request comes in. it runs SDXL, a checkpoint, a few controlnets, etc. About 15-20GB in total. ...

Jobs Stays in In-Progress for forever

Sometimes I never get response when I make a request. It stays in progress and doesn't even show execution time.
No description

How to Get the Progress of the Processing job in serverless ?

When I use status/id, it only return like {delayTime: 873, id: 3e9eb0e4-c11d-4778-8c94-4d045baa99c1-e1, status: IN_PROGRESS, workerId: eluw70apx442ph}, no progress data. I want progress data just like screenshot on serverless console log。Please tell me how to get it in app client....
No description

Rundpod serverless Comfyui template

I couldn’t find any comfyui template on runpod serverless

Why is Runsync returning status response instead of just waiting for image response?

My runsync requests are getting messed up by runpod returning a sync equivalent response (with 'IN_PROGRESS' status and id showing). I need to just return the image, or a failure, not the status using runsync. If I want the status I would just use 'run'. Any idea why this is happening and how to prevent it? For reference, this is for request that generally run for 5-18 seconds to completion. delayTime: 196...

Worker Keeps running after idle timeout

Hi! I have observed that my worker keeps running even there is no request and idle time(60s) has been reached. Also when I make a new request in such a moment my request fails....
No description

May I deploy template ComfyUI with Flux.1 dev one-click to serverless ?emplate

When I click deploy, I only see 'Deploy GPU Pod' , no serverless .
No description

What is the real Serverless price?

In Serverless I have 2 gpu/worker and 1 active worker. The price it shows on the main page is $0.00046/s but in the endpoint edit page it shows $0.00152/s. What is the actual price?