R
Runpod•2y ago
karinenavas

Serverless Endpoint failing occasionally

i'm pretty new to runpod and started off with a serverless endpoint. when calling the api i sometimes get a failed response as return but can not really retrace whats wrong exactly.. calling the same API with the same input again works.. also the logs don't provide more information. How can i figure out, what is causing this error? Is there a best practise to catch the FAILED calls and analyze why they occur? Happy for any help!
22 Replies
karinenavas
karinenavasOP•2y ago
+: it is my custom endpoint which takes as input a list of text articles and outputs a list of labels.
haris
haris•2y ago
@karinenavas are you able to provide the request you're making as well as the code you're using, and the response if any.
karinenavas
karinenavasOP•2y ago
this is the request I am making
No description
karinenavas
karinenavasOP•2y ago
I get the message "Job failed with status FAILED" and None return
karinenavas
karinenavasOP•2y ago
also the logs for the worker throwing the error, do not give away too much information
No description
digigoblin
digigoblin•2y ago
I think he wanted the handler code. You should use a try, except block like this in your handler too. And the return the errors as a string in the error key. It doesn't support a list or a dict.
karinenavas
karinenavasOP•2y ago
could you give me an example?
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
karinenavas
karinenavasOP•2y ago
i am using my own model
karinenavas
karinenavasOP•2y ago
open to suggestions for improvement 🙏
No description
No description
No description
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
J.
J.•2y ago
If you want, you can maybe sanity check and start small: https://blog.runpod.io/serverless-create-a-basic-api/ Btw, if your on mac, make sure you are targeting the right platform of --platform amd64 Yeah haha, just deleted my comments, realized they got it right
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
J.
J.•2y ago
i fly around lol, currently in west coast
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
karinenavas
karinenavasOP•2y ago
Actually i started with a Blog Post Tried to follow the steps so i thought this i a pretty simple api already The model is from hf, works fine if i integrate it in a normal workbook, i just need the runpod gpu Could it be possibly due to memory Overhead? Is there a propor error handling to Catch that?
digigoblin
digigoblin•2y ago
Impossible to know without a full stack trace of the error
karinenavas
karinenavasOP•2y ago
My Quick fix for now is to Catch the failed calls and retry after 5 seconds. This works but still Almost every fifth call is a fail
digigoblin
digigoblin•2y ago
Best to use a try, except block and log the stack trace to your error response Yeah thats not good, going to waste a lot of money like that
karinenavas
karinenavasOP•2y ago
Try and except the classify_articles call?
digigoblin
digigoblin•2y ago
Yeah, basically your entire handler function
karinenavas
karinenavasOP•2y ago
Ill try, thanks for the hint

Did you find this page helpful?