R
Runpod•10mo ago
dgaff

job timed out after 1 retries

Hello! Getting this on every job now on 31py4h4d9ytybu endpoint on serverless. My logs have zero messages or indication about where this is happening, from the outside it looks as if the are totally paused or non-responsive. This silently hung work for over an hour. I'm on runpod 1.7.4. This is currently having significant impacts on production work, without any clear remediation (see screenshots for no logs for many many minutes despite work happening constantly, and errors on every job). Would love some help!!
No description
No description
9 Replies
Unknown User
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
dgaff
dgaffOP•10mo ago
@nerdylive lol so it ended up being that I needed to version bump my runpod package from 1.7.4 to 1.7.7. Very frustrated that a patch level version fixes this, like how would I have possibly ever found out unless I spent three days blaming myself for this trying to fix it then reaching out to customer service lol
Unknown User
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
WaleedAtif
WaleedAtif•3w ago
brother thankk you sooo muchhhhh its been 3 weeks and you solved my problem i am retarded, was using 1.7.1(from my old repository) and kept getting this bumped to 1.7.13 and boom it got solved
dgaff
dgaffOP•3w ago
lololol @WaleedAtif our solution was to just stop using runpod, we did that, its much better now
Unknown User
Unknown User•3w ago
Message Not Public
Sign In & Join Server To View
WaleedAtif
WaleedAtif•3w ago
Yes What do you use now?
dgaff
dgaffOP•3w ago
just aws
WaleedAtif
WaleedAtif•2w ago
This is an expensive alternative🥲

Did you find this page helpful?