Serverless FAILING to add Workers
I have a queue-based endpoint created & i have 4 requests in the pipeline.
It's been over 30-40 mins and Serverless has failed to recruit any new H100 worker for me.
I don't have any data-centers (regions) specified.

34 Replies
Why is this happening?
here are my endpoint settings:

Unknown User•2w ago
Message Not Public
Sign In & Join Server To View
nothing of sort
just plain GPU
also, I have sufficient balance ....

Unknown User•2w ago
Message Not Public
Sign In & Join Server To View
refreshed it dozens of times 🙁
Unknown User•2w ago
Message Not Public
Sign In & Join Server To View

No volume or specified region
Unknown User•2w ago
Message Not Public
Sign In & Join Server To View
network tab:

@flash-singh no workers, is this our setup configuration problem or is it runpod's capacity problem? thanks.
Unknown User•2w ago
Message Not Public
Sign In & Join Server To View
@Immar K
Escalated To Zendesk
The thread has been escalated to Zendesk!
Ticket ID: #24966
Unknown User•2w ago
Message Not Public
Sign In & Join Server To View
yeah, we're 3 hours past and nothing yet 🙁
I have been trying a bunch of stuff but nothing seems to have worked for me
also, what's more strange is this warning: Currently 100% of your max workers are busy. Consider increasing your max workers to 7 to handle higher demand and improve performance.
and for some reason it doesn't let me go beyond 5 Max workers. If i write 6 or 7, it rewrites it back to 5. some check on the frontend maybe. not sure
Unknown User•2w ago
Message Not Public
Sign In & Join Server To View
i have ~ $100 in the account with $80/hr limit
Unknown User•2w ago
Message Not Public
Sign In & Join Server To View
I am sorry what does that mean?
how's that possible
it shows a single worker from your ss
Unknown User•2w ago
Message Not Public
Sign In & Join Server To View

it shows 5/5 which is not true
Unknown User•2w ago
Message Not Public
Sign In & Join Server To View
I see, it says you can not have > 5 Max workers with balance under $100
regardless, it says 5/5 workers deployed - which i beleive is not true since I don't see anything in the workers tab.
Unknown User•2w ago
Message Not Public
Sign In & Join Server To View
sure, yes it's getting too cluttered here
Ticket created.
looking at another strange behaviour : an instance came up but comfyUI failed to start (likely a hardware issue since i am running latest cuda version):
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\n
torch.AcceleratorError: CUDA error: CUDA-capable device(s) is/are busy or unavailable return torch.cuda.cudart().cudaMemGetInfo(device)\n
File "/root/ComfyUI/venv/lib/python3.10/site-packages/torch/cuda/memory.py", line 838, in mem_get_info\n
mem_total_cuda = torch.cuda.mem_get_info(dev)\n
i'll wait for support to get back to meUnknown User•2w ago
Message Not Public
Sign In & Join Server To View
No, i don't have any filters.
my template is not 12.9, i have ran my applicaiton in older cuda versions as well on other platforms, etc
Hi! I am getting the same issue with using 80GB pro gpus. Even when creating new endpoints with 48GB gpus it's not loading the model / running any generations.
Unknown User•2w ago
Message Not Public
Sign In & Join Server To View
Same issue here, my new (and only) endpoint is in "Initializing" state for the last 3 hours, without any workers.
Unknown User•2w ago
Message Not Public
Sign In & Join Server To View
I recreated my endpoint and it seems to be working again now