job timed out after 1 retries
Hello! Getting this on every job now on 31py4h4d9ytybu endpoint on serverless. My logs have zero messages or indication about where this is happening, from the outside it looks as if the are totally paused or non-responsive. This silently hung work for over an hour. I'm on runpod 1.7.4. This is currently having significant impacts on production work, without any clear remediation (see screenshots for no logs for many many minutes despite work happening constantly, and errors on every job). Would love some help!!


9 Replies
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
@nerdylive lol so it ended up being that I needed to version bump my runpod package from 1.7.4 to 1.7.7. Very frustrated that a patch level version fixes this, like how would I have possibly ever found out unless I spent three days blaming myself for this trying to fix it then reaching out to customer service lol
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
brother thankk you sooo muchhhhh
its been 3 weeks and you solved my problem
i am retarded, was using 1.7.1(from my old repository) and kept getting this
bumped to 1.7.13 and boom it got solved
lololol
@WaleedAtif our solution was to just stop using runpod, we did that, its much better now
Unknown User•3w ago
Message Not Public
Sign In & Join Server To View
Yes
What do you use now?
just aws
This is an expensive alternative🥲