Serverless workers frequent switch to initializing / throttles

Been playing with Serverless workers for a couple weeks now and love it. But since yesterday I noticed a huge uptick of workers not sticking to their idle state. Every couple jobs they either go back to Initializing or are throttled. This never was the case for the weeks I was using serverless. Very rarely they would lose their state but currently its happening inbetween every couple jobs. Is there any datacenter maintenance going on or are the H100's just super occupied currently? My configuration: - CUDA 12.8 - All locations (no network volume attached) - 80GB Pro (H100)
No description
4 Replies
yhlong00000
yhlong000002mo ago
Is your endpoint currently set to use only the FR data center? You might want to avoid relying on a single data center, spreading across multiple DCs can help in this case.
gokuvonlange
gokuvonlangeOP2mo ago
I have all data centers selected.. I'm not picky about location except for CUDA version For weeks I had no issues up until yesterday.
No description
Dj
Dj2mo ago
It looks like we're just occupied on the H100 devices, it could help to also permit CUDA 12.9. Depending on your workflow it should "just work".
gokuvonlange
gokuvonlangeOP2mo ago
Considering cuda's backwards compatibilty.. it should work. I will try it out, thanks!

Did you find this page helpful?