Intermittent connection timeouts to api.runpod.ai

{
"endpointId":"oic105cyzlovnk"
"workerId":"3cwou4m0x6hxl0"
"level":"error"
"message":"Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/oic105cyzlovnk/job-done/3cwou4m0x6hxl0/31062127-b452-4afa-a4ba-1d6f07134a4e-u1?gpu=NVIDIA+RTX+A5000&isStream=false"
"dt":"2024-06-13 19:16:55.63793120"
}
{
"endpointId":"oic105cyzlovnk"
"workerId":"3cwou4m0x6hxl0"
"level":"error"
"message":"Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/oic105cyzlovnk/job-done/3cwou4m0x6hxl0/31062127-b452-4afa-a4ba-1d6f07134a4e-u1?gpu=NVIDIA+RTX+A5000&isStream=false"
"dt":"2024-06-13 19:16:55.63793120"
}
8 Replies
digigoblin
digigoblin2mo ago
@1AndOnlyPika is also experiencing this Endpoint ID can be copied and pasted from my JSON above. @1AndOnlyPika can you add your endpoint id please?
nerdylive
nerdylive2mo ago
Yeah sometimes I see this on the logs too but sometimes when this happen, it returns the job result as usual
digigoblin
digigoblin2mo ago
Yeah I think it retries when this happens, because I see I had 3 retried requests
nerdylive
nerdylive2mo ago
Ooh
1AndOnlyPika
1AndOnlyPika2mo ago
rtqb8oacytm879 in my worker code, i added a ping to a random website right before finishing the job and hwen it fails to return job restuls, the connection still goes through so its an issue with runpod and not the worker itself going to look into switching to a websocket for communicating with my worker and getting restuls
nerdylive
nerdylive2mo ago
i think its not possible if the ws is initiated from outside but if its from serverless well you could try
haris
haris2mo ago
cc: @Satish
1AndOnlyPika
1AndOnlyPika2mo ago
2024-06-15T06:25:02.615876953Z b'66.114.112.126\n'
2024-06-15T06:25:02.615921744Z {"requestId": "13258cfa-7493-4bfc-9400-02a9f9e81073-u1", "message": "Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/rtqb8oacytm879/job-done/2qf6cgmhd4onn7/13258cfa-7493-4bfc-9400-02a9f9e81073-u1?gpu=NVIDIA+GeForce+RTX+4090&isStream=false", "level": "ERROR"}
2024-06-15T06:25:02.615939904Z {"requestId": "13258cfa-7493-4bfc-9400-02a9f9e81073-u1", "message": "Finished.", "level": "INFO"}
2024-06-15T06:25:02.615876953Z b'66.114.112.126\n'
2024-06-15T06:25:02.615921744Z {"requestId": "13258cfa-7493-4bfc-9400-02a9f9e81073-u1", "message": "Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/rtqb8oacytm879/job-done/2qf6cgmhd4onn7/13258cfa-7493-4bfc-9400-02a9f9e81073-u1?gpu=NVIDIA+GeForce+RTX+4090&isStream=false", "level": "ERROR"}
2024-06-15T06:25:02.615939904Z {"requestId": "13258cfa-7493-4bfc-9400-02a9f9e81073-u1", "message": "Finished.", "level": "INFO"}
it calls icanhazip.com before and outputs it, so internet is working, just runpod isnt also sometimes get this requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://api.runpod.ai/v2/rtqb8oacytm879/status/964637fc-0084-4872-a91f-0e82a88b592c-u1