R
Runpod7d ago
Milad

Throttling on multiple endpoints and failed workers

All of our endpoints with RTX 4090 workers are fully throttled, some with over 100+ workers. There is no incident report or any update here or the status page. Workers consiostently come up and get stuck in loading the image and to top it all they are in the executing state and charge the account.
10 Replies
Evgeniy_Wis
Evgeniy_Wis7d ago
+
Ian Chen
Ian Chen7d ago
+1 has been experiencing a lot today
JC
JC6d ago
Same here Seems like RunPod has supply and demand issues
Dj
Dj6d ago
Throughout this week we've been running emergency maintenance and the users most affected are those running serverless workloads with popular GPUs. Where we may have a surplus of a specific GPU, we have to delist the machines that host the GPUs (where it's up to 8 GPUs per machine) to perform work on them. We are obligated to perform this maintenance across the fleet and only ask for your patience until it's done and we can disclose the reason.
Kie
Kie5d ago
Just started using serverless today... is this normal?
Dj
Dj5d ago
No, just caught us at a bad time - sorry.
c
c5d ago
Based on my experience, Runpod does not appear to be production ready. Each time I’ve attempted to use it or deploy a workload, I’ve encountered issues, with no documented incident reports and unanswered emails. This calls into question the claim of SOC 2 Type II compliance. In the past, I also reported significant slowness (“delay time”) for which I was billed; the root cause was never identified and the issue remained unresolved. Sad...
c
c5d ago
No description
Dj
Dj5d ago
The email linked to the pods in this screenshot has no support tickets. Can you send me a message with your ticket ids or the email you used to contact us?

Did you find this page helpful?