2 active workers on serverless endpoint keep rebooting

We have 2 active workers on a serverless endpoint, sometimes the workers reboot at the same time for some reason, which causes major problems in our system.
2024-04-03T14:37:16Z create pod network 2024-04-03T14:37:16Z create container endpoint-image:1.2 2024-04-03T14:37:17Z start container 2024-04-03T15:27:23Z stop container 2024-04-03T15:27:24Z remove container 2024-04-03T15:27:24Z remove network 2024-04-03T15:27:30Z create pod network 2024-04-03T15:27:30Z create container endpoint-image:1.2 2024-04-03T15:27:30Z start container 2024-04-03T17:34:51Z stop container 2024-04-03T17:34:51Z remove container 2024-04-03T17:34:51Z remove network
Has anyone ever had this problem? How to fix it?
Runpods version : 1.3.0 Docker Image : Python 3.11-slim Our image version : 1.2
8 Replies
Madiator2011
Madiator20114mo ago
your serverless worker needs to have startup command and you just run plain python docker image
Captain Barbossa
Our Docker image already has a command to start with, should I add one anyway in our Runpods templates?
justin
justin4mo ago
Not sure if ur saying u had this api working before, and suddenly just these two workers these things happen, or if ur saying ur trying to deploy serverless. If the latter, ur trying to deploy, and running into this issue, as madiator said make sure ur calling specifically the handler.py which needs to have a runpod.start() call in the file to be triggered
justin
justin4mo ago
GitHub
runpodWhisperx/Dockerfile at master · justinwlin/runpodWhisperx
Runpod WhisperX Docker Container Repo. Contribute to justinwlin/runpodWhisperx development by creating an account on GitHub.
justin
justin4mo ago
Are u doing so?
justin
justin4mo ago
https://blog.runpod.io/serverless-create-a-basic-api/ Ex. of runpod blog walking thro the setup
RunPod Blog
Serverless | Create a Custom Basic API
RunPod's Serverless platform allows for the creation of API endpoints that automatically scale to meet demand. The tutorial guides you through creating a basic worker and turning it into an API endpoint on the RunPod serverless platform. For this tutorial, we will create an API endpoint that helps us accomplish
Captain Barbossa
Thanks for the answer Yes I have a handler.py file with :
runpod.serverless.start({
"handler" : do_something,
"return_aggregate_stream" : True,
})
runpod.serverless.start({
"handler" : do_something,
"return_aggregate_stream" : True,
})
And in my dockerfile, I got this command:
CMD ["python", "-u", "handler.py"]
CMD ["python", "-u", "handler.py"]
Everyhitng works fines normally but now every X hours, the active worker reboots for no reason at all
flash-singh
flash-singh4mo ago
active workers can shuffle, thats normal, there is no single active worker that is dedicated to being an active worker, its last man standing algorithm, its meant to optimize for cost