RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Serverless worker keeps failing

We run several serverless workers in parallel to run the inference. Sometimes a serverless worker starts failing with OOM and all the following runs on the same worker will fail until the worker is terminated. We have noticed that the retries initiated by our backend always end up on the same worker. Let's say we have 10 prompts, and we run one prompt per worker, the retries with the same prompt always end up on the same worker. ...

Started getting errors connecting to google cloud storage

Hello, approximately on jan 31, 0:00 AM we started getting error while uploading files to google cloud storage from serverless workers. For background; we have been using the same endpoint for months and have ~5k daily request with very low fail rate and did not do any changes recently. Not all workers seems affected and repro rate is not 100% for the affected ones Error; We're sorry, but this service is not available in your location Example request id: aeb40bea-99b9-427d-95af-757d3d481d40-u1 Worker id; azlu9ylswr53kh...

OSError in vLLM worker; issues when its new update was released

I was using vLLM worker 1.7.0 and everything was working fine till yesterday. Today I am facing issues in all of my endpoints where huggingface models are deployed using the vLLM worker. Runpod logs shows OSError and the model cant be identified. I then deployed a new endpoint with latest configuration of vLLM worker 1.9 and everything worked in the way it used to. @Justin Merrell Runpod should let us know its changes atleast, so it does not affect the endpoints in production....
No description

Can’t make Qwen/Qwen2.5-VL-3B-Instruct model work on serverless

Qwen/Qwen2.5-VL-3B-Instruct Anybody able to make it work? When it will be supported?...

Whitelist IP Addresses

A good tool would be a whitelist for IP Adresses, to have more control of inbound and outbound traffic. As far as i can see, this feature is not present? (sth like a reverse proxy)...

How much does it cost to use multi-GPU ?

I'd like to increase the number of GPUs per worker to get better performance with parallelization. When i read this post: https://blog.runpod.io/runpod-serverless-pricing-update/, I have the impression that the cost is only linked to the “type of GPU” (16GB, 24GB, 48GB, ...) and that increasing the number of GPUs per worker doesn't increase the price per second. But that doesn't seem logical to me. Do I pay as much if I use a worker for 30s with 2 GPUs VS a worker for 30s with only 1 GPU? Or does the worker with 2 GPUs cost twice as much as the worker with a single GPU? Also, when I read the doc: https://docs.runpod.io/serverless/references/endpoint-configurations#gpus--worker, it says that multi-GPU is only available on 48GB instances, but in the interface I get the impression that it's available on other types (the ones I'm interested in are 24GB). Is it just that the documentation isn't up to date, or is it a display problem?...
No description

Serveless UI broken for some endpoints

Since the latest UI changes, clicking on some endpoints will create a constant loading of the runpod logo and the UI never loads. This seems to be only happening with certain enpoints.
No description

Need help in fixing long running deployments in serverless vLLM

Hi, I am trying to deploy migtissera/Tess-3-Mistral-Large-2-123B model on serverless using 8 48GB GPUs using vLLM. The total size of model weights around 245 GB. I have tried two ways: 1st way without any network volume, it takes really long time to serve first request as it needs to download the weights. Then if the worker goes to idle and I sent a request again it downloads weights and takes long time....
No description

A job start in a worker and seems to be relaunch in another worker.

Hi I have setup an image that install comfyui and some custom node, and as input I have a workflow, the entire workflow is supposed to take a few minutes to run entirely (maybe 5/6min on a A100), but strangely, it start well, and around the end it stop on a worker and re-start in another worker

delayTime representing negative value

On some requests I started to see delayTime in negative and this affects my own autoscaler.
No description

Serveless quants

Hi, how do you specify a specific gguf quant file from a hf repo when configuring a vllm serveless endpoint? Only seems to let you specify the repo level.

DeepSeek R1 Serverless for coding

I'm interested in running an FP16 DeepSeek R1 and I am wondering if Serverless is the way to go or if a Pod would be better. I need this for 2-3 hours at a time and I would like a 'dedicated' access to this environment. Which DeepSeek R1 model should I pick (GGUF?) and how should I configure the deployment tool in Serverless to get it to run on an H100? Thanks in advance for any help....

In Faster whisper serverless endpoint, how do i get english transcription for tamil audio

In Faster whisper serverless endpoint, how do i get english transcription for tamil audio. When i test it with tamil audio, I get output like this, how do I get it in English.
No description

Stuck vLLM startup with 100% GPU utilization

Twice now today I've deployed a new vLLM endpoint using the "Quick Deploy" "Serverless vLLM" option at: https://www.runpod.io/console/serverless only to have the worker stuck after launching the vLLM process and before reaching the weights downloading. It never reaches the state of actually downloading the HF model and loading it into vLLM. * The image I've used is Qwen/Qwen2.5-72B-Instruct * The problematic machines have all been A6000. * Only a single worker configured with 4 x 48GB GPUs was set in the template configuration, in order to make the problem easier to track down (a single pod and a single machine)....

How to respond to the requests at https://api.runpod.ai/v2/<YOUR ENDPOINT ID>/openai/v1

the openai input is in the job input, I extracted it and processes the request . when send the the response with yield or return it recived could you take a look at this [https://github.com/mohamednaji7/runpod-workers-scripts/blob/main/empty_test/test%20copy%203.py] ...

worker-vllm not working with beam search

Hi, I found another bug in your worker-vllm. Beam search is not supported even though your README says it is. This time around it's length_penalty not being accepted. Can you please work on a fix for beam search? Thanks!

All GPU unavailable

I just started using RunPod. Yesterday, I created my first serverless endpoint and submitted a job, but I didn't receive a response. When I investigated the issue, I found that all GPUs were unavailable. The situation hasn't changed since then. Could you tell me what I should do?
No description

/runsync returns "Pending" response

Hi, I've send a request to my /runsync endpoint and it returned a {job... status:"pending"} response. Can someone clarify when this happens? When the request is taking too long to complete?