Running Workers Not Shutting Down & Incurring Charges
Hi, we're facing a critical issue with workers not shutting down when there's nothing in queue/progress, which is causing significant over-billing and blocking our app launch. Reporting this after it happened at least 3 times.
I've observed that after all jobs are processed (finished/cancelled and nothing in queue), workers continue running for over 8 minutes doing nothing. I noticed it happening with both scaling settings:
- Queue Delay: A worker ran for 8+ minutes with an empty queue (attached a video of this below)
- Request Count: Two separate workers ran for 8+ minutes after the last job was processed (I sent these messages when it happened: https://discord.com/channels/912829806415085598/948767517332107274/1388527617510084651 https://discord.com/channels/912829806415085598/948767517332107274/1388531493768527932)
This costed me another $10 in credits over just two days. Just the two examples I shared above are examples where the workers run for 24 minutes (1 worker running over 8 mins on Queue Delay + 2 workers running over 8 mins on Request Count = 3 workers of RTX4090 GPU 24 GB (PRO) running over 24 mins) and charged us when it wasn't doing anything. I've spent $100 in just two months on testing alone, and issues like these are preventing me from launching our app since we can't rely on the platform scaling to function properly as we will launch the app in a server with over 10K members.
Everything on our app has been ready for two months, we have to launch as soon as possible when serverless endpoints work properly (please also see the other issue we have https://discord.com/channels/912829806415085598/1375136211395547246).
I'd really appreciate if you could help with this. I can share the logs with a DM if needed. Thank you for your time!
24 Replies
Logs would have been helpful in this video. So much time spent on everything but the most important part. I was waiting for you to click on Logs in that worker detail view. What was going on there?
Thank you for checking Dean! These are the logs from when the video was recorded:
The pod with id
64yksjmwg97wvk just failed to start, all I see is a reload loop :thinkMan:Yeah lol. Isn't it an issue that it kept running when there wasn't anything in queue?
Sort of, we're still allocating you a GPU even if your pod only exists for like 300ms (the average time I see just skimming the log here).
The content of the error:
You can set a minimum CUDA version in the endpoint settings
And I can issue you however much you lost in GPU time - I'd just need a little longer to get it figured out
Sending a new log file because what I sent was wrong...
Looks like the log timestamps on RunPod website vs. the file I download from there show timestamps in different timezones, which made me send you logs from different hours
Here are the logs from when I recorded this
I included an extra line before and after when this happened just so you have all the logs
Thank you guys for checking 🙏
The container you show in the videos is
64yksjmwg97wvk which is different from the 6w1168q7pwltxm pod who's logs you saw in the video, the 64yks pod never started 👀Right? 😅 It would make a bit more sense if the 6w116 was the worker that had the issue, because it was the one used for running the requests. A bit strange the 64ykw worker tried to start for 8 mins
If you change your endpoint settings to only allow your Pod to be started on CUDA 12.6 or higher you won't have the issue again.
I added a little to your account to remedy it, but I don't think anything too unusual happened. I'll see what I can do to get us a limit on how many times we let a specific worker fail to start?
Thank you so much, DJ! I’ve set it to CUDA 12.7 now, will let you know if I notice it happening again
This was the second time we run out of credits in two days in June, so I wanted to report here. The previous time was also weird like this, we hadn’t even generated 20 images if I remember correctly.
I think it would be helpful to be able to see request history, just like how we see when there’s a recent request (which disappear shortly)

Also would be really helpful to be able to see how much each generation approximately costs. It costs per usage (running workers) but seeing around how much it is per generation would be great
Just some data that could help users and the RunPod team to notice odd usage if a similar issue happens again
Final feedback I’d like to share related to this thread is to maybe enabling CUDA 12.6 or higher as default endpoint settings. Thanks again for your help DJ!
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
wanted to check here to see if others also experienced it before creating a ticket. thanks to the help I got here, the issue has been fixed after selecting CUDA 12.6 or higher versions 🙌
marking it as solved
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
yeah, it can be a bit confusing when users encounter CUDA-related issues for the first time (as it happened to us). maybe in the future, serverless endpoints can automatically detect/sync CUDA version from the image they are using
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
thank you! I shared it with someone at Runpod and heard he created an internal ticket about it :poddy:
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
yes 😅 I don't know if I was allowed to say his name but he's been super helpful
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View