R
Runpod12mo ago
bossman

job timed out after 1 retries

Been seeing this a ton on my endpoint today resulting in being unable to return images. response_text: "{"delayTime":33917,"error":"job timed out after 1 retries","executionTime":31381,"id":"sync-80dbbd6d-309c-491f-a5d0-2bd79df9c386-e1","retries":1,"status":"FAILED","workerId":"a42ftdfxrn1zhx"}
21 Replies
bossman
bossmanOP12mo ago
endpoint id 1m5phinhax6q0p
rougsig
rougsig12mo ago
Me too endpoint id uucgkak7h76hfd
rougsig
rougsig12mo ago
@bossman Do you have something like this?
No description
bossman
bossmanOP12mo ago
yep
Poddy
Poddy12mo ago
@bossman
Escalated To Zendesk
The thread has been escalated to Zendesk!
Unknown User
Unknown User12mo ago
Message Not Public
Sign In & Join Server To View
bossman
bossmanOP12mo ago
1.71
Unknown User
Unknown User12mo ago
Message Not Public
Sign In & Join Server To View
yhlong00000
yhlong0000012mo ago
Try to update your SDK to 1.7.4
bossman
bossmanOP12mo ago
Just updated. Still seeing failures and things getting stuck in progress with no action { "delayTime": 33710, "id": "88b7d266-6d28-47f2-8640-d67d33c57ed6-u1", "retries": 1, "status": "IN_PROGRESS", "workerId": "caskw2lx8e1xu1" } { "delayTime": 39768, "error": "job timed out after 1 retries", "executionTime": 40212, "id": "108cda05-865d-40df-b19c-3ece785c8ca0-u1", "retries": 1, "status": "FAILED", "workerId": "caskw2lx8e1xu1" }
flash-singh
flash-singh11mo ago
pm me details, there is something clearly weird going on with your endpoint
bossman
bossmanOP11mo ago
FYI Every single time it fails to process an incoming request. I see this in the log:
bossman
bossmanOP11mo ago
And a job status that just shows "IN_PROGRESS" indefinitely. If there's any way to expedite a response on this please let me know. Mission critical stuff happening next week, hoping to get this resolved before then
Unknown User
Unknown User11mo ago
Message Not Public
Sign In & Join Server To View
bossman
bossmanOP11mo ago
Hey, PM'd details.
addsn
addsn11mo ago
@bossman any luck solving the above? i am also getting the same job timed out after 1 retries error on my serverless endpoint
bossman
bossmanOP11mo ago
I wish. Continued errors for me
addsn
addsn11mo ago
@bossman saw another similar thread, i downgraded my runpod library to 1.7.2, the issue got solved after that
bossman
bossmanOP11mo ago
Tried 1.7.1 to 1.7.5 with the same results. But over the weekend something must have changed. 100% success rates today. Also I switched my worker delay from queue to # of workers. Not sure if that had an effect, but testing with old setup I'm still seeing 100% success 12/1 and 12/2, with errors ending 11/30.
Unknown User
Unknown User11mo ago
Message Not Public
Sign In & Join Server To View
flash-singh
flash-singh11mo ago
we had no updates over the holidays, we will be issuing fixes this week as we have tracked down the issue and are debugging root cause of it

Did you find this page helpful?