Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Running Workers Not Shutting Down & Incurring Charges

Hi, we're facing a critical issue with workers not shutting down when there's nothing in queue/progress, which is causing significant over-billing and blocking our app launch. Reporting this after it happened at least 3 times. I've observed that after all jobs are processed (finished/cancelled and nothing in queue), workers continue running for over 8 minutes doing nothing. I noticed it happening with both scaling settings: - Queue Delay: A worker ran for 8+ minutes with an empty queue (attached a video of this below) - Request Count: Two separate workers ran for 8+ minutes after the last job was processed (I sent these messages when it happened: https://discord.com/channels/912829806415085598/948767517332107274/1388527617510084651 https://discord.com/channels/912829806415085598/948767517332107274/1388531493768527932)...

Runpod setting my workers to 0

We got an email that one the workers was not active for 15 days, one of our clients sometimes is active and sometimes is not, why would Runpod willingly manage my infrastructure If I have paid for my credits and should be able to manage my workers however I want within my limits. Why is the Runpod Teams interfering with our infrastructure? I've never heard of a provider manipulating resources in the clients account when everything is up to date....

Unstable Serverless GPU performance (Mostly too slow)

Despite using the same specifications and Docker image, the 4090 GPU workers have become unstable. ComfyUI image-generation tasks used to run at about 4–5 iterations per second, but since 28 June, they’ve been wildly inconsistent. Sometimes 1 iteration per second, and sometimes even 4 seconds per iteration. I’ve tried terminating the faulty workers and restarting them, even with a fresh Docker image, but the issue persists....
No description

OLLAMA docker from docker hub.

How can i completeley disable logs for my ollama endpoint. i used the ready ollama serveless container. Reason im asking is because currently in thelogs i see full qas which i dont want

CPU usage rises to 99% even though no processing is being performed

This issue has been observed on the US-IL-1 RTX4090 since around June 27th. The image used is as follows. This can be reproduced by starting a serverless worker with the sleep command to prevent it from dying. runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04...
No description

Can I use a stronger single core cpu for the serverless

now it has 8-16 vCPU, but the single core is not strong enough, I wonder if I can just have 1 or 2 cores, but with a stronger cpu

Streaming responses Serveless Endpoint

Currently using serveless endpoint for inference, and it seems the streaming response is not working the same as with a dedicated endpoint. I have same setup both when is dedicated and when is serveless. i can see that the reponses are not coming the same streaming way and speed as the dedicated endpoint.
Solution:
Yes. The default value is 50. MIN_BATCH_SIZE should be default 1 already.

Bad workers are lurking. Clear differences in processing speed

Despite having the same specs, serverless has workers with obvious problems with processing speed. All the jobs in the image have the exact same processing, and the exact same DC server specs, but there is a big difference in processing speed. I terminate these bad workers every day through batch processing, but they keep appearing every day, and there's no end to it. (US-IL-1 RTX 4090)...
No description

Comfy worker for custom docker is taking too much time

I have deployed a custom comfy worker with custom nodes and custom models which is working fine on the server. I have tested everything on hyperstack server. when I have deployed that on runpod serverless worker , the request is taking too much time. I have waited for 20 minutes and I was not able to get any response. Also I can not see any logs in serverless UI. ...

Still waiting

I've opted to deploy my first serverless GPU, I opted for a RTX A5000, and 'bigcode/starcoder'. It's been almost 40 minutes. And I'm still waiting for models to complete. How long does this normal take for a 16B model? My person GPU (4070) was able to load it within seconds.?
No description

Failed to return job results.

My endpoint has been working fine for the last month, but then, without making any changes, it started sending similar logs from time to time:
{"requestId": "563f49bc-fe15-466f-a91c-99911ab7b100-e2", "message": "Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/idxmpy4kkpl9d1/job-done/qombpdb0mbxpew/563f49bc-fe15-466f-a91c-99911ab7b100-e2?gpu=NVIDIA+GeForce+RTX+4090&isStream=false", "level": "ERROR"}
{"requestId": "563f49bc-fe15-466f-a91c-99911ab7b100-e2", "message": "Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/idxmpy4kkpl9d1/job-done/qombpdb0mbxpew/563f49bc-fe15-466f-a91c-99911ab7b100-e2?gpu=NVIDIA+GeForce+RTX+4090&isStream=false", "level": "ERROR"}
Although, judging by the same logs, the request was executed successfully, but runpod just can't send the response correctly. This happens on all my endpoints: idxmpy4kkpl9d1, e2u67i0khvang0, vmoqasbdvt7wl6, if2vaadpx2bo1u....

Serverless GPU is unstable

Hi team, We are currently using serverless to host our inference model, but we've observed that GPU performance is highly unstable — the same task can take anywhere from 3ms to 100ms. In contrast, performance is very stable on a reserved pod, consistently ranging from 3ms to 5ms. We’re wondering if RunPod’s serverless is sharing a single GPU across multiple users' jobs. If that’s the case, please let us know so we can make an informed decision about whether to continue using serverless or switch to a reserved pod. Thank you!...

CUDA error: CUDA-capable device(s) is/are busy or unavailable

I see quite a few jobs fail with this error message: RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable18:38:08CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. This usually happens for all jobs on a worker (I have to terminate the worker). A retry on another worker completes as expected....

Unacceptable downtime

I'm having constant outages for GPUs and racking up crazy numbers in the queue. These requests will all fail. Why is there such a shortage right now. It's making me consider migration. Thanks...

The DelayTime is too long

CPU serverless pod, DelayTime is 8m 19s. Why?
No description

How to increase the idle time >3600s?

I want to run the serverless docket image. its starts but after the idle it starts the inicialisation again. In my install process I need to download waights and after the idle the process dowloading starts again and again

Can't set enviornment variables

I'm setting secrets on serverless endpoind in the settings, but they just disappear when I check them. Those secrets are shared in another serverless endpoint from long ago....

How to manage frequent redownloads of large docker images ?

Hello I am trying to use a serverless endpoint for a large docker image that is being redownloaded every time I make a request, but it can take 5+ minutes to finish, so zero of my requests are coming through. I understand that serverless pods are not gaurenteed to hit but surely there is something I can do other than making my image smaller? Otherwise my service is unusable...

MD5 mismatch error when running aws s3 cp

I am new to using serverless storage and I am trying to use S3 compatible network storage. I have created my volume on one of the S3-compatible regions. When trying to upload an image with the command aws s3 cp --region EUR-IS-1 --endpoint-url https://s3api-eur-is-1.runpod.io/ ./flight-2-1.JPG s3://<my-bucket-id>/flight-2-1.JPG I get the following error: An error occurred (BadDigest) when calling the PutObject operation: MD5 mismatch Anyone that can tell me what I am doing wrong?...

ComfyUI server (127.0.0.1:8188) not reachable after multiple retries

My serverless endpoint worker that ran fine for weeks suddenly fails every request ``` { "delayTime": 7929,...