Runpod•4d ago

Failing requests

Hey, all of my serverless endpoint requests are failing. I’ve tried every available GPU config (24GB, 24GB PRO, 32GB PRO, 80GB PRO, 141GB) and they all show “High Supply”, but nothing is processing, the entire service is effectively down. This has been going on for a while now and there’s zero communication. If there’s an outage or scaling issue, please just say it, so we can stop waiting and plan accordingly. Can someone from the team confirm what’s happening and whether there’s an ETA on a fix? edit: endpoint ID = uv7fieonipxw1q

19 Replies

HugoOP•4d ago

logs

riverfog7•3d ago

it doesn't look like a runpod problem try with 32gb pro unchecked

HugoOP•3d ago

I'll try removing 32GB Pro! but I’m almost certain this isn’t an issue on my end because its the same container that ive been running for a month or two with no major issues until this week. The logs show a CUDA kernel mismatch (sm_120), which points to (according to gpt5) RunPod rotating their serverless pool to newer GPUs that current PyTorch builds don’t support yet. looks like an infrastructure side change and not a container config problem.

riverfog7•3d ago

uhh its complicated to explain as far as I know only GPU with a 32Gb VRAM is RTX 5090 which uses NVIDIA's Blackwell architecture

riverfog7•3d ago

and it has cuda compute capability of 12.0

riverfog7•3d ago

this log is saying cuda compute capability sm_120 (so blackwell) is not supported in this installation so I told you to disable 32GB PRO option to avoid blackwell GPUs since 141GB=Hopper, 80GB PRO is probably Ampere, 24GB and 24GB PRO is probably Ada Lovelace and under so nothing has blackwell there

HugoOP•3d ago

yeah rolling this out now. but as you said, the RTX 5090s use the new blackwell architecture with new CUDA compute capability (sm_120) which older pytorch builds dont fully support yet, which explains why the same container that was previously working is now running into issues. isnt it their responsiblity to maintain backward compatibility or at the very least communicate GPU rotations that break existing builds?

riverfog7•3d ago

nono they maintain the hardware you are supposed to maintain the software that runs top of that did runpod automatically enable 32GB pro on a existing serverless endpoint? if that's the case, this is their fault

HugoOP•3d ago

yes true, im responsible for my container environment and im fine updating the image if needed but if the hardware rotation breaks existing compatibility there should at the very least be an announcement. That said, I don’t think this is the only issue because I only added 32GB two days ago, but even before that starting monday this week, 60% of my requests were taking over 3 minutes instead of the usual ~20 seconds. So something deeper seems off on the serverless side. without mentioning the initializing being broken? eg i have 2 workers that have been initializing since yesterday

riverfog7•3d ago

it can be nice to have a warning message besides 32GB and 180GB option since it doesn't list the specific GPU types there but this is clearly user error in my opinion. You should have checked what GPUs they were using for that option.

HugoOP•3d ago

ok but i added it 2 days ago my problem started 5 days ago

riverfog7•3d ago

serverless being weird recently is another problem tho it did that 5 days ago? isn't it a different error are you sure its the same error?

HugoOP•3d ago

let me check exact logs but it was timing out/taking 3+ minutes

riverfog7•3d ago

oh also uncheck the 180GB option if you have it on it uses the B200 GPU so blackwell

HugoOP•3d ago

ur right the 32 was def the ones causing actual errors. thanks for this. the other ones were succeeding, but just that they were taking 3 minutes (normal time 7s-30s) + which is beyond my timeout. so thats when i added the 32 (for context) thoughts on this? this was my initial issue. container just ran for 8 minutes like this (normal response 7s-30s). image same as when I had no issues. just pulled locally and tested no issues.

riverfog7•3d ago

Something is wrong with the pod

HugoOP•3d ago

also workers constantly switch between throttled/initializing for 12+ hours

riverfog7•3d ago

Thats strange

Gaming

Programming

Failing requests

Did you find this page helpful?