Failing requests
Hey, all of my serverless endpoint requests are failing. I’ve tried every available GPU config (24GB, 24GB PRO, 32GB PRO, 80GB PRO, 141GB) and they all show “High Supply”, but nothing is processing, the entire service is effectively down.
This has been going on for a while now and there’s zero communication. If there’s an outage or scaling issue, please just say it, so we can stop waiting and plan accordingly.
Can someone from the team confirm what’s happening and whether there’s an ETA on a fix?
edit: endpoint ID = uv7fieonipxw1q

19 Replies
logs

it doesn't look like a runpod problem
try with 32gb pro unchecked
I'll try removing 32GB Pro! but I’m almost certain this isn’t an issue on my end because its the same container that ive been running for a month or two with no major issues until this week.
The logs show a CUDA kernel mismatch (sm_120), which points to (according to gpt5) RunPod rotating their serverless pool to newer GPUs that current PyTorch builds don’t support yet.
looks like an infrastructure side change and not a container config problem.
uhh
its complicated to explain
as far as I know only GPU with a 32Gb VRAM is RTX 5090
which uses NVIDIA's Blackwell architecture
and it has cuda compute capability of 12.0


this log is saying cuda compute capability sm_120 (so blackwell) is not supported in this installation
so I told you to disable 32GB PRO option to avoid blackwell GPUs
since 141GB=Hopper, 80GB PRO is probably Ampere, 24GB and 24GB PRO is probably Ada Lovelace and under
so nothing has blackwell there
yeah rolling this out now. but as you said, the RTX 5090s use the new blackwell architecture with new CUDA compute capability (sm_120) which older pytorch builds dont fully support yet, which explains why the same container that was previously working is now running into issues. isnt it their responsiblity to maintain backward compatibility or at the very least communicate GPU rotations that break existing builds?
nono
they maintain the hardware
you are supposed to maintain the software that runs top of that
did runpod automatically enable 32GB pro on a existing serverless endpoint?
if that's the case, this is their fault
yes true, im responsible for my container environment and im fine updating the image if needed but if the hardware rotation breaks existing compatibility there should at the very least be an announcement. That said, I don’t think this is the only issue because I only added 32GB two days ago, but even before that starting monday this week, 60% of my requests were taking over 3 minutes instead of the usual ~20 seconds. So something deeper seems off on the serverless side. without mentioning the initializing being broken? eg i have 2 workers that have been initializing since yesterday
it can be nice to have a warning message besides 32GB and 180GB option
since it doesn't list the specific GPU types there
but this is clearly user error in my opinion. You should have checked what GPUs they were using for that option.
ok but i added it 2 days ago
my problem started 5 days ago
serverless being weird recently is another problem tho
it did that 5 days ago?
isn't it a different error
are you sure its the same error?
let me check exact logs but it was timing out/taking 3+ minutes
oh also uncheck the 180GB option if you have it on
it uses the B200 GPU
so blackwell
ur right the 32 was def the ones causing actual errors. thanks for this. the other ones were succeeding, but just that they were taking 3 minutes (normal time 7s-30s) + which is beyond my timeout. so thats when i added the 32 (for context)
thoughts on this? this was my initial issue. container just ran for 8 minutes like this (normal response 7s-30s). image same as when I had no issues. just pulled locally and tested no issues.
Something is wrong with the pod
also workers constantly switch between throttled/initializing for 12+ hours

Thats strange