Serverless Endpoint failing occasionally
i'm pretty new to runpod and started off with a serverless endpoint. when calling the api i sometimes get a failed response as return but can not really retrace whats wrong exactly.. calling the same API with the same input again works.. also the logs don't provide more information. How can i figure out, what is causing this error? Is there a best practise to catch the FAILED calls and analyze why they occur? Happy for any help!
22 Replies
+: it is my custom endpoint which takes as input a list of text articles and outputs a list of labels.
@karinenavas are you able to provide the request you're making as well as the code you're using, and the response if any.
this is the request I am making

I get the message "Job failed with status FAILED" and None return
also the logs for the worker throwing the error, do not give away too much information

I think he wanted the handler code. You should use a try, except block like this in your handler too.
And the return the errors as a string in the
error key. It doesn't support a list or a dict.could you give me an example?
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
i am using my own model
open to suggestions for improvement 🙏



Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
If you want, you can maybe sanity check and start small:
https://blog.runpod.io/serverless-create-a-basic-api/
Btw, if your on mac, make sure you are targeting the right platform of --platform amd64
Yeah haha, just deleted my comments, realized they got it right
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
i fly around lol, currently in west coast
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Actually i started with a Blog Post Tried to follow the steps so i thought this i a pretty simple api already
The model is from hf, works fine if i integrate it in a normal workbook, i just need the runpod gpu
Could it be possibly due to memory Overhead? Is there a propor error handling to Catch that?
Impossible to know without a full stack trace of the error
My Quick fix for now is to Catch the failed calls and retry after 5 seconds. This works but still Almost every fifth call is a fail
Best to use a try, except block and log the stack trace to your error response
Yeah thats not good, going to waste a lot of money like that
Try and except the classify_articles call?
Yeah, basically your entire handler function
Ill try, thanks for the hint