RunPod worker is being launched, which ignores my container's ENTRYPOINT

Hello, I'm experiencing an issue with my serverless endpoint. Despite my endpoint being configured to use a 'Custom' worker with my own Docker image (ovyrlord/comfyui-runpod:v1.27), the logs show that a generic RunPod worker is being launched, which ignores my container's ENTRYPOINT. I have verified all my settings and pushed multiple new image tags, but the issue persists. Can you please investigate and clear any stuck configurations on your end for my endpoint?
19 Replies
Unknown User
Unknown User2w ago
Message Not Public
Sign In & Join Server To View
OVYRLORD
OVYRLORDOP2w ago
I have been using a different version tag for each iteration. That's how we got up to 27. It looks like I attached the wrong version of the logs to the initial post. Here is the version that actually shows the issue. What ends up happening is I start an instance and the process starts to spin up and after everything gets loaded, before my workflow is actually passed to the serverless worker, it runs the line "starting service worker version 1.7.13" after that happens, the serverless instance is dumped, and it starts a new worker. If I cancel the job, it will continue to run and won't stop until I actually go in and terminate the worker. Otherwise it will just Loop indefinitely.
Dj
Dj2w ago
I'm sorry I just don't understand, the logs I see show the workflow you want unless I'm missing something? Your entrypoint is loaded, the script does what it should and passes off to your start.sh? :thinkMan:
OVYRLORD
OVYRLORDOP2w ago
This workflow should be generating an image or video then sending it to a Cloudflare bucket. What's happening is ComfyUI initializes then the Starting Service Worker line comes up and the whole instance deletes the container and restarts before it ever loads the initial checkpoint to start processing. The frustrating thing is that it worked just fine a couple days ago with an image workflow, then we tried a video workflow and started having this odd loop. Then we tried an image workflow again to compare for troubleshooting and the image workflow started having the same problem We had a separate ticket open for the budget it ate by running on its own even after we canceled the request and we received an email from your support team saying "After investigating, we found that a worker issue caused a brief degradation in our backend systems that has not self resolved, which led to an increase in your billing during that time." So it would seem to me that I have my image and files in order to work according to your system requirements, but when I actually try to process a job, my image boots up but before it actually attempts to run the payload it triggers a degradation event with the backend worker that then dumps my current session and spins up a new machine; just to fail and restart again And if the time sink trying to troubleshoot it and our service downtime wasn't bad enough; we're just burning through our budget trying to find a work around or fix. And if we don't explicitly terminate the service worker that initiated the run, it will continue to loop in the background even after the job is canceled.
OVYRLORD
OVYRLORDOP2w ago
Dj
Dj2w ago
It's not very much we can control, when I looked at the logs for your endpoint I saw the same restart loop but with way less helpful context. At some point between your transition from image to video something broke, but maybe the logging for it is being swallowed. I can't see the exact endpoint ID you were using when you reported the bug, I'm working on hunting it down now to show you the logs but I'm a lot more loose with granting credits than the support team can be so I'll make sure you're taken care of. I'll be back soon (sent my message early since we're both around)
OVYRLORD
OVYRLORDOP2w ago
I appreciate the help. I just took that video now to show exactly what we've been experiencing It wasn't until recording that video that I realized it goes through the loop twice with one worker, then switches to another worker to keep it going; effectively bypassing your loop protection, as well as not actually processing the job
Dj
Dj2w ago
I see the same thing, can you check your repo to make sure you didn't change your Runpod Handler? That job sits in queue because your workers are never actually emitting that they're ready. Over the last day I can only see sticking over the last 2 weeks for the endpoint in your video was cancelled by you manually.
OVYRLORD
OVYRLORDOP2w ago
The handler used is the same file that is attached to the initial post here And I had been manually cancelling everything to stop it from looping out of control again
Unknown User
Unknown User2w ago
Message Not Public
Sign In & Join Server To View
OVYRLORD
OVYRLORDOP2w ago
I can see it's not comfyui crashing. It works when I run it locally and on a standard pod. It's literally that line "Starting Service Worker" that kills the instance and loops the boot process
Unknown User
Unknown User2w ago
Message Not Public
Sign In & Join Server To View
OVYRLORD
OVYRLORDOP2w ago
right. So my handler function fires, which was working initially, but then after a batch of successful runs; whenever my handler is triggered, it then triggers another service worker in batch and starts reloading the image all over again, causing an infinite loop
Unknown User
Unknown User2w ago
Message Not Public
Sign In & Join Server To View
OVYRLORD
OVYRLORDOP2w ago
If you can tell me what's wrong with it, I will happily change it
Unknown User
Unknown User2w ago
Message Not Public
Sign In & Join Server To View
OVYRLORD
OVYRLORDOP2w ago
I'll try it

Did you find this page helpful?