Serverless workers frequent switch to initializing / throttles
Been playing with Serverless workers for a couple weeks now and love it.
But since yesterday I noticed a huge uptick of workers not sticking to their idle state. Every couple jobs they either go back to Initializing or are throttled.
This never was the case for the weeks I was using serverless. Very rarely they would lose their state but currently its happening inbetween every couple jobs.
Is there any datacenter maintenance going on or are the H100's just super occupied currently?
My configuration:
- CUDA 12.8
- All locations (no network volume attached)
- 80GB Pro (H100)

4 Replies
Is your endpoint currently set to use only the FR data center? You might want to avoid relying on a single data center, spreading across multiple DCs can help in this case.
I have all data centers selected.. I'm not picky about location except for CUDA version
For weeks I had no issues up until yesterday.

It looks like we're just occupied on the H100 devices, it could help to also permit CUDA 12.9. Depending on your workflow it should "just work".
Considering cuda's backwards compatibilty.. it should work.
I will try it out, thanks!