Issues in SE region causing a massive amount of jobs to be retried
The issues in the screenshot are causing 10% of my jobs to be retried in SE region. Please fix this, its not happening in CA region.

20 Replies
Obviously I am referring to the "Connection timeout" errors which causes the job results to fail to be returned, and not the single exeption among them.
@digigoblin DO YOU MIND SUBMITING AS TICKET ON WEBSITE EASIER TO ESCALATE
No need to shout but sure 😁
ups sorry for caps
Ticket number is 4208
done
Thank you
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
You probably didn't try and send 1000 jobs today
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
I said 10% are retried NOT ALL 🤦♂️
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
They are retried they don't fail
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
RunPod needs to check it out, I switched to CA in the meantime and it works fine without any issues.
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
I was using CA but then switched to SE because my jobs were failing, but it was actually because my own Redis server had OOM issues due to running out of memory and wasn't a RunPod issue.
So I upgraded my ElastiCache instance on AWS from
cache.t3.medium to cache.m4.large and now its fine.Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Because its a cluster not a single instance
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View