Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2...
Hi, have a serverless endpoint. Job completes successfully but the results are never returned and the job times out.
Any ideas how to resolve this? There are a few threads about this, but the conversation always drifted to another topic. Also I have submitted a ticket yesterday with no response. I am using this in production and whole my website is not working because of this. Seriously concerned about using runpod as this is probably the fourth time all stopped working for one or another reason.
Total progress: 100%|██████████| 38/38 [00:09<00:00, 4.15it/s]
2024-07-12T17:16:46.296682863Z INFO: 127.0.0.1:60656 - "POST /sdapi/v1/img2img HTTP/1.1" 200 OK
2024-07-12T17:16:48.343583857Z {"requestId": "7c46852a-a3b1-46a8-a0dc-2fe8c006287a-e1", "message": "Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/2lpmcp0nczozlm/job-done/nbvo20j26ji2mh/7c46852a-a3b1-46a8-a0dc-2fe8c006287a-e1?gpu=NVIDIA+RTX+A5000&isStream=false", "level": "ERROR"}
2024-07-12T17:16:48.343613177Z {"requestId": "7c46852a-a3b1-46a8-a0dc-2fe8c006287a-e1", "message": "Finished.", "level": "INFO"}
21 Replies
Unknown User•17mo ago
Message Not Public
Sign In & Join Server To View
no, it is a run endpoint and then I am checking it.
Unknown User•17mo ago
Message Not Public
Sign In & Join Server To View
five of 512x512, 768x768 or 1024x1024 images. Is this too much?
Unknown User•17mo ago
Message Not Public
Sign In & Join Server To View
it used to work just fine, started a few days ago when other people started reporting it too (https://discord.com/channels/912829806415085598/1258094433816019114, https://discord.com/channels/912829806415085598/1185337101307367535/threads/1257349366973202453)
Unknown User•17mo ago
Message Not Public
Sign In & Join Server To View
What do you have set for Execution Timeout(s)?
120 seconds, but the job usually completes after 10. This message appears on completion (after 10 seconds).
I'm experiencing the same issue.
When the
Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/{endpoint-id}/job-done/... error occurs, the job remains stuck in IN_PROGRESS indefinitely, even though the log indicates that the job is completed. Plus, no webhook is sent from RunPod. It appears that RunPod fails to mark the job as completed due to this internal HTTP request failure.
Some people have suggested that this issue might be caused by a large payload returned from the handler. However, in my case, the output size is only a few KBs, as it is just a JSON containing a URL to the output file.Unknown User•16mo ago
Message Not Public
Sign In & Join Server To View
Same here
Seems to be fixed now.
I did however clone the endpoint into a new one, just in case.
Unknown User•16mo ago
Message Not Public
Sign In & Join Server To View
I sure as hell hope so ! yeah i think something crashed in their backend.
Unknown User•16mo ago
Message Not Public
Sign In & Join Server To View
The problem still exists, it just occurred again.
Yes, it only happens sometimes, not consistently. It seems like the internal webhook connection on RunPod isn't stable.
I hope this issue gets fixed ASAP because it causes production jobs to get stuck indefinitely. Even worse, the stuck jobs might continue to drain credits.
Unknown User•16mo ago
Message Not Public
Sign In & Join Server To View
What again ?? Inacceptable ! This got me doubting runpod.
The issue itself is really minor and stupid technically speaking -_-
having the same issue here too. Only started happening in the past couple of days but I haven't modified the handler at all in weeks
2024-07-20T23:41:26.892920681Z {"requestId": "f2d7a4d6-8bde-40d2-8ac9-d86e4134c165-u1", "message": "Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/...", "level": "ERROR"}
using the async endpoint
edit: scaling workers to 0 then back up seems to have fixed it for nowSame issues
2024-08-13T08:00:41.098287548Z ERROR | Error while getting job: Connection timeout to host https://api.runpod.ai/v2/...../job-take/.....?gpu=NVIDIA+RTX+A5000&job_in_progress=0
Unknown User•15mo ago
Message Not Public
Sign In & Join Server To View