RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

"Error decoding stream response" on Completed OpenAI compatible stream requests

Context I have a custom worker on serverless, I am streaming a response from async OpenAI python client. Error When making requests on the OpenAI compatible API endpoint, non-streaming is fine, but stream requests always return with:...

GitHub builds failing "Unable to acquire machine, please retry"

After about 5 minutes the build fails: ``` Build using docker ......

Deployed deepseek-ai/DeepSeek-R1-Distill-Llama-8B on Serverless

Has anyone deployed deepseek-ai/DeepSeek-R1-Distill-Llama-8B on Serverless? I have loaded it but it answers infinitely. 😅

Setting up CD for serverless endpoint

I tried the GitHub integration but our Docker image base is a private image so the build system needs to support using credentials I also tried the Docker image approach - this works great for our pre-built images but how can I setup CD for this?...

Why serverless endpoints try to repull from container when doing inference?

We using ECR so there a 12 hour token expiration hard to deal with this because somethings no ones there to deal with refreshing the tokens. I find it surprising at the middle of the day endpoints would repull from the container then it will error obviously because the token already expired from the ECR hence the endpoint will not work anymore.

need help getting better gpus

none of my jobs are running in production. need better gpus. who can i talk to

Can I increase max workers beyond 10?

I see that I can upgrade from 5 workers to 10 upon topup. Can we go higher than that? Say, 100 max workers?

Why the serverless downloading instead of "running" when i trigger the runpod id?

My app is having connection because of this, anyone exp the same thing? Im using github integration approach
No description

openai/v1 and open-webui

Hey Team, Looking at your docs, and at the question "How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1"; I've run into a weird gotcha. When I do a GET --- ```bash curl -X GET https://api.runpod.ai/v2/<endpoint here>/openai/v1 ...

Job Never Picked Up by a Worker but Received Execution Timeout Error and Was Charged

I set the execution timeout to a maximum of 45 seconds (the job usually takes about 20–30 seconds) and the idle timeout to 1 second. I sent three requests, with the last one being sent after the first job was completed. However, after 57 seconds, the last request timed out. I checked the logs, and no workers picked it up, yet I can see that my serverless billing charged me for the last request as well. We are going live in two weeks, and it's crucial to ensure that we are not charged for requests that were never processed. Any insights on why this might be happening and how to prevent it?...

Serverless worker keeps failing

We run several serverless workers in parallel to run the inference. Sometimes a serverless worker starts failing with OOM and all the following runs on the same worker will fail until the worker is terminated. We have noticed that the retries initiated by our backend always end up on the same worker. Let's say we have 10 prompts, and we run one prompt per worker, the retries with the same prompt always end up on the same worker. ...

Started getting errors connecting to google cloud storage

Hello, approximately on jan 31, 0:00 AM we started getting error while uploading files to google cloud storage from serverless workers. For background; we have been using the same endpoint for months and have ~5k daily request with very low fail rate and did not do any changes recently. Not all workers seems affected and repro rate is not 100% for the affected ones Error; We're sorry, but this service is not available in your location Example request id: aeb40bea-99b9-427d-95af-757d3d481d40-u1 Worker id; azlu9ylswr53kh...

OSError in vLLM worker; issues when its new update was released

I was using vLLM worker 1.7.0 and everything was working fine till yesterday. Today I am facing issues in all of my endpoints where huggingface models are deployed using the vLLM worker. Runpod logs shows OSError and the model cant be identified. I then deployed a new endpoint with latest configuration of vLLM worker 1.9 and everything worked in the way it used to. @Justin Merrell Runpod should let us know its changes atleast, so it does not affect the endpoints in production....
No description

Can’t make Qwen/Qwen2.5-VL-3B-Instruct model work on serverless

Qwen/Qwen2.5-VL-3B-Instruct Anybody able to make it work? When it will be supported?...

Whitelist IP Addresses

A good tool would be a whitelist for IP Adresses, to have more control of inbound and outbound traffic. As far as i can see, this feature is not present? (sth like a reverse proxy)...

How much does it cost to use multi-GPU ?

I'd like to increase the number of GPUs per worker to get better performance with parallelization. When i read this post: https://blog.runpod.io/runpod-serverless-pricing-update/, I have the impression that the cost is only linked to the “type of GPU” (16GB, 24GB, 48GB, ...) and that increasing the number of GPUs per worker doesn't increase the price per second. But that doesn't seem logical to me. Do I pay as much if I use a worker for 30s with 2 GPUs VS a worker for 30s with only 1 GPU? Or does the worker with 2 GPUs cost twice as much as the worker with a single GPU? Also, when I read the doc: https://docs.runpod.io/serverless/references/endpoint-configurations#gpus--worker, it says that multi-GPU is only available on 48GB instances, but in the interface I get the impression that it's available on other types (the ones I'm interested in are 24GB). Is it just that the documentation isn't up to date, or is it a display problem?...
No description

Serveless UI broken for some endpoints

Since the latest UI changes, clicking on some endpoints will create a constant loading of the runpod logo and the UI never loads. This seems to be only happening with certain enpoints.
No description
Next