How long is the delay on serverless?

I'm consistently getting 30 minutes (mostly 1-2 hours or more) of delay for my requests to my serverless endpoint, is this the default? This is totally unusable.
No description
1 Reply
Xeverian
Xeverian2mo ago
there might be some hiccups now and then but 1h is definitely an error. Mine is usually within 1-20s. If you also see a running worker while the job is delayed like this then it means you've set the "allowed cuda version" filter of your endpoint to wrong one and it can't start properly. If you have cuda 12.6 - set it to 12.6-12.8, if it's 12.9 - then set to 12.9

Did you find this page helpful?