Serverless Endpoint failing occasionally

i'm pretty new to runpod and started off with a serverless endpoint. when calling the api i sometimes get a failed response as return but can not really retrace whats wrong exactly.. calling the same API with the same input again works.. also the logs don't provide more information. How can i figure out, what is causing this error? Is there a best practise to catch the FAILED calls and analyze why they occur? Happy for any help!
24 Replies
karinenavas
karinenavas3mo ago
+: it is my custom endpoint which takes as input a list of text articles and outputs a list of labels.
haris
haris3mo ago
@karinenavas are you able to provide the request you're making as well as the code you're using, and the response if any.
karinenavas
karinenavas3mo ago
this is the request I am making
No description
karinenavas
karinenavas3mo ago
I get the message "Job failed with status FAILED" and None return
karinenavas
karinenavas3mo ago
also the logs for the worker throwing the error, do not give away too much information
No description
digigoblin
digigoblin3mo ago
I think he wanted the handler code. You should use a try, except block like this in your handler too. And the return the errors as a string in the error key. It doesn't support a list or a dict.
karinenavas
karinenavas3mo ago
could you give me an example?
nerdylive
nerdylive3mo ago
Hey
nerdylive
nerdylive3mo ago
are you using the runpod endpoints (https://www.runpod.io/endpoints)
RunPod Endpoints
AI endpoints for Stable Diffusion, Dreambooth, Whisper, and many more.
nerdylive
nerdylive3mo ago
or you are using your own models??
karinenavas
karinenavas3mo ago
i am using my own model
karinenavas
karinenavas3mo ago
open to suggestions for improvement 🙏
No description
No description
No description
nerdylive
nerdylive3mo ago
where is the model stored then? i cant really see the problem there Try to see the logs page too in serverless try sending it here too nah its ok bro the rp start is normal
justin
justin3mo ago
If you want, you can maybe sanity check and start small: https://blog.runpod.io/serverless-create-a-basic-api/ Btw, if your on mac, make sure you are targeting the right platform of --platform amd64 Yeah haha, just deleted my comments, realized they got it right
nerdylive
nerdylive3mo ago
oh ya where ru from btw its so late here lol
justin
justin3mo ago
i fly around lol, currently in west coast
nerdylive
nerdylive3mo ago
oooh nice
karinenavas
karinenavas3mo ago
Actually i started with a Blog Post Tried to follow the steps so i thought this i a pretty simple api already The model is from hf, works fine if i integrate it in a normal workbook, i just need the runpod gpu Could it be possibly due to memory Overhead? Is there a propor error handling to Catch that?
digigoblin
digigoblin3mo ago
Impossible to know without a full stack trace of the error
karinenavas
karinenavas3mo ago
My Quick fix for now is to Catch the failed calls and retry after 5 seconds. This works but still Almost every fifth call is a fail
digigoblin
digigoblin3mo ago
Best to use a try, except block and log the stack trace to your error response Yeah thats not good, going to waste a lot of money like that
karinenavas
karinenavas3mo ago
Try and except the classify_articles call?
digigoblin
digigoblin3mo ago
Yeah, basically your entire handler function
karinenavas
karinenavas3mo ago
Ill try, thanks for the hint