RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

🔧｜api-opensource

📡｜instant-clusters

🗂｜hub

faileon

2/6/2025

You do not have permission to perform this action.

Hello everyone, trying to access my serverless function, but I just can't get it to work... ``` curl --location 'https://api.runpod.ai/v2/<endpoint-id>/runsync' \ --header 'Authorization: Bearer rpa_VPG4....' ...

LisT_99

2/6/2025

vLLM serverless output cutoff

I deployed a serverless vLLM using deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B But when i made a request, output is only 16 tokens (tested many times), I don't change anything from default setting but max_model_length to 32768. How can i fix that? or did I miss any config?...

digger18

2/6/2025

"worker exited with exit code 1" in my serverless workloads

No other information in the logs. Itis a GPU (cuda) enabled container. What is the best way to debug this?...

tzushi

2/5/2025

"Error decoding stream response" on Completed OpenAI compatible stream requests

Context I have a custom worker on serverless, I am streaming a response from async OpenAI python client. Error When making requests on the OpenAI compatible API endpoint, non-streaming is fine, but stream requests always return with:...

example_request.json

twobit

2/5/2025

GitHub builds failing "Unable to acquire machine, please retry"

After about 5 minutes the build fails: ``` Build using docker ......

DEVIL_EGOX

2/4/2025

Deployed deepseek-ai/DeepSeek-R1-Distill-Llama-8B on Serverless

Has anyone deployed deepseek-ai/DeepSeek-R1-Distill-Llama-8B on Serverless? I have loaded it but it answers infinitely. 😅

pkpio

2/4/2025

Setting up CD for serverless endpoint

I tried the GitHub integration but our Docker image base is a private image so the build system needs to support using credentials I also tried the Docker image approach - this works great for our pre-built images but how can I setup CD for this?...

yoyo

2/4/2025

Why serverless endpoints try to repull from container when doing inference?

We using ECR so there a 12 hour token expiration hard to deal with this because somethings no ones there to deal with refreshing the tokens. I find it surprising at the middle of the day endpoints would repull from the container then it will error obviously because the token already expired from the ECR hence the endpoint will not work anymore.

Mav

2/3/2025

need help getting better gpus

none of my jobs are running in production. need better gpus. who can i talk to

pkpio

2/3/2025

Can I increase max workers beyond 10?

I see that I can upgrade from 5 workers to 10 upon topup. Can we go higher than that? Say, 100 max workers?

yoyo

2/2/2025

Why the serverless downloading instead of "running" when i trigger the runpod id?

My app is having connection because of this, anyone exp the same thing? Im using github integration approach

CoverGhoul

2/2/2025

openai/v1 and open-webui

Hey Team, Looking at your docs, and at the question "How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1"; I've run into a weird gotcha. When I do a GET --- ```bash curl -X GET https://api.runpod.ai/v2/<endpoint here>/openai/v1 ...

yoyo

2/1/2025

Max image github repo serverless intergration can take?

vesper

2/1/2025

Job Never Picked Up by a Worker but Received Execution Timeout Error and Was Charged

I set the execution timeout to a maximum of 45 seconds (the job usually takes about 20–30 seconds) and the idle timeout to 1 second. I sent three requests, with the last one being sent after the first job was completed. However, after 57 seconds, the last request timed out. I checked the logs, and no workers picked it up, yet I can see that my serverless billing charged me for the last request as well. We are going live in two weeks, and it's crucial to ensure that we are not charged for requests that were never processed. Any insights on why this might be happening and how to prevent it?...

Simon

1/31/2025

Serverless worker keeps failing

We run several serverless workers in parallel to run the inference. Sometimes a serverless worker starts failing with OOM and all the following runs on the same worker will fail until the worker is terminated. We have noticed that the retries initiated by our backend always end up on the same worker. Let's say we have 10 prompts, and we run one prompt per worker, the retries with the same prompt always end up on the same worker. ...

tok

1/31/2025

Started getting errors connecting to google cloud storage

Hello, approximately on jan 31, 0:00 AM we started getting error while uploading files to google cloud storage from serverless workers. For background; we have been using the same endpoint for months and have ~5k daily request with very low fail rate and did not do any changes recently. Not all workers seems affected and repro rate is not 100% for the affected ones Error; We're sorry, but this service is not available in your location Example request id: aeb40bea-99b9-427d-95af-757d3d481d40-u1 Worker id; azlu9ylswr53kh...

Ashique A B

1/31/2025

OSError in vLLM worker; issues when its new update was released

I was using vLLM worker 1.7.0 and everything was working fine till yesterday. Today I am facing issues in all of my endpoints where huggingface models are deployed using the vLLM worker. Runpod logs shows OSError and the model cant be identified. I then deployed a new endpoint with latest configuration of vLLM worker 1.9 and everything worked in the way it used to. @Justin Merrell Runpod should let us know its changes atleast, so it does not affect the endpoints in production....

tech

1/30/2025

Can’t make Qwen/Qwen2.5-VL-3B-Instruct model work on serverless

Qwen/Qwen2.5-VL-3B-Instruct Anybody able to make it work? When it will be supported?...

Sven

1/30/2025

Whitelist IP Addresses

A good tool would be a whitelist for IP Adresses, to have more control of inbound and outbound traffic. As far as i can see, this feature is not present? (sth like a reverse proxy)...

Zambla

1/30/2025

How much does it cost to use multi-GPU ?

I'd like to increase the number of GPUs per worker to get better performance with parallelization. When i read this post: https://blog.runpod.io/runpod-serverless-pricing-update/, I have the impression that the cost is only linked to the “type of GPU” (16GB, 24GB, 48GB, ...) and that increasing the number of GPUs per worker doesn't increase the price per second. But that doesn't seem logical to me. Do I pay as much if I use a worker for 30s with 2 GPUs VS a worker for 30s with only 1 GPU? Or does the worker with 2 GPUs cost twice as much as the worker with a single GPU? Also, when I read the doc: https://docs.runpod.io/serverless/references/endpoint-configurations#gpus--worker, it says that multi-GPU is only available on 48GB instances, but in the interface I get the impression that it's available on other types (the ones I'm interested in are 24GB). Is it just that the documentation isn't up to date, or is it a display problem?...

Previous Next

Gaming

Programming

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!