R
Runpod2mo ago
Jaya

Workers are getting throttled

Hey guys, Workers are getting throttled. I have 50 workers limit and most of them are getting throttled. My application is being impacted havily. For a note, its mostly happening for US based workers. I have no preferences around GPU or CUDA so its starting worker randomly across the globe.
35 Replies
J.
J.2mo ago
Hey @Jaya do you have an endpoint id you can share?
Jaya
JayaOP2mo ago
J.
J.2mo ago
@Jaya what i see is that your GPU is only set to 4090, or the 24gb pro, if you are okay with doing so / your application can handle it, i can recommend to also maybe allow for a 24gb in your selection for endpoint. 4090s are quite popular and can be eaten up. You can also request for a larger worker max if you feel that is helpful to increase your workers across your endpoints 🙂
Jaya
JayaOP2mo ago
Ok can you help me with a large worker max? I will also have 24gb has secondary but honestly speaking its just started since last couple of days It wasnt that bad so far
J.
J.2mo ago
@Jaya i increased your max workers to 75, to give you a bit more wiggle room on any critical endpoints so feel free to bump that up, if you need even more workers, you can fill out a hubspot form, in the UI at your workers amount if you need more you can click it, and it will bring you to a form to update even more
Jaya
JayaOP2mo ago
Thanks Justin
Xeverian
Xeverian2mo ago
Can confirm that it doesn't look good last several days (was fine before). I also use 4090s
No description
PotapovS
PotapovS2mo ago
Hey 🖐 I'm experiencing the same issue. Almost all workers on 4090 are showing as throttled with Low Supply. The 5090s are completely unavailable.
J.
J.2mo ago
Just to confirm, this is without region restrictions? Is this also no region restrictions?
PotapovS
PotapovS2mo ago
Yes. Region doesn't matter.
J.
J.2mo ago
Yeah, 4090s and 5090s might just be being used right now popular GPUs. but raised it to the team to see if any further concerns, but this can happen especially if everyone is concentrating on these two GPU types
PotapovS
PotapovS2mo ago
Thanks for answer. Please help. Because GPUs don't work at all. Constant throttled.
J.
J.2mo ago
Can you share with me the endpoint? is your entire endpoint throttled? It should still have left some to use
PotapovS
PotapovS2mo ago
The entire endpoint is throttled. I think the id is not important. You can create a point yourself now and check that 4090 is almost unavailable. 5090 is completely unavailable. I see that workers are initialized, but then immediately become throttled.
J.
J.2mo ago
Will test and raise to the team, thanks for reporting Reported to the teeam @PotapovS / @Xeverian / @Jaya is being tracked + worked on fyi Thanks for helping to report the issue
Jaya
JayaOP2mo ago
This does not seem to be improving @justin (New) [Staff Not Staff] . I have also a 48 GB A40, A6000 but there is none which can take place
No description
J.
J.2mo ago
Maybe can set the other 48gb as higher priority for now in the menu, but yes, I’ve already raised this as a high priority and they identified the potential issue and are rolling some stuff back and changes are currently underway It should be better now today FYI: 4090s and 5090s issue were resolved yesterday night it seemed. Thanks again to everyone for pointing out the issue.
PotapovS
PotapovS2mo ago
I confirm. The problems have been resolved. Thank you and the team for your help!
Jaya
JayaOP2mo ago
Now there is another issue. Requests are taking way too long time to move from in queue to in progress. And this time I have it as H100. It seems the problematic ones are those running in North America and are unable to reserve GPUs It seems its a cuda error. I have no preferences around cuda. This has been giving lot of trouble recently.
J.
J.2mo ago
Sorry to hear, feel free to create a support ticket: https://contact.runpod.io/hc/en-us/requests/new And will be more easily escalted to the right team / points of contacts who can look deeper into it
PotapovS
PotapovS4w ago
Hey 🖐 Problem with throttled workers back. All regions, all cuda versions. 5090 constantly falls off and throttled.
Jaya
JayaOP4w ago
I can confirm its back with 4090 as well
Hidobo
Hidobo3w ago
5090 completely unusable
PotapovS
PotapovS2w ago
Hey 🖐 Problem with throttled workers back again. All regions, all cuda versions. 5090 constantly falls off and throttled.
J.
J.2w ago
Thank you, will be taking a look and raising to the team Thanks have confirmed - there is extremely high usage right now from someone eating up GPUs. Flagged to the team
Jackie
Jackie2w ago
5090's are super scuffed rn
J.
J.2w ago
Yes; to give an update we just have extremely high utilization right now from all customers maxing out the data-centers. Team is planning to increase capacity in upcoming 1-2 weeks since takes time to get physical hardware online
WeamonZ
WeamonZ2w ago
Not a single one of my endpoints have a 5090 available 🫠🫠
Alex Bruskin
Alex Bruskin2w ago
I want to comment on the throttling issue. For us it started a few days ago, and it is always happening around 11am EST. It feels like someone starts running something big, pushing everyone else out. We tried data centers in Iceland and Romania, it is all the same.
Jaya
JayaOP6d ago
This is back again in last few days. Most of the workers getting killed for 4090. Also, some times it happens with H100 as well This is highly disappointing. It seems you cant run a business with assumptions you will get serverless GPUs on runpod.
Dj
Dj6d ago
Throughout this week we've been running emergency maintenance and the users most affected are those running serverless workloads with popular/low cost GPUs. Where we may have a surplus of a specific GPU, we have to delist those machines to perform work on them. We are obligated to perform this maintenance across the fleet and only ask for your patience until it's done and we can disclose the reason.
Jaya
JayaOP6d ago
Your reason is absolutely justified but you have to consider this fact that other businesses are relying on GPUs provided by you.
Xeverian
Xeverian5d ago
: (
No description
Xeverian
Xeverian5d ago
😢
No description
Xeverian
Xeverian5d ago
and it's 4090 in EU-RO-1, the best one for those cards

Did you find this page helpful?