When servless is used, does the machine reboot if it is executed consecutively? Currently seeing iss
When servless is used, does the machine reboot if it is executed consecutively? Currently seeing issues with last execution affecting the next
68 Replies
There's been a problem with this workid alf4z19ubk8n71
v8lkcxxh6wjd6q k5xlystwyzjbm3
FlashBoot isn't guaranteed. Its influenced by how many workers you have and whether you're sending a constant flow of requests or not.
I don't know what you mean by the machine rebooting though. Machines don't reboot. The container starts to handle a request, then shuts down again once the idle timeout period is reached when its idle and not processing any further requests.
I assume you're referring to flash boot though and nothing to do with rebooting.
I don't have flashboot turned on, because of my comfyui, I need to start 1 service, if the old environment is not cleared, will the service startup conflict at this point?
You have to enable flash boot, otherwise you have a cold start on every single request. I use flash boot with comfyui and it works perfectly. Depends on which comfyui worker you're using though.
Thank you very much for the advice, I'll try it later but My problem at the moment isn't that it takes a long time, it's that it often reports errors.I think serverless should be a clean environment every time it executes.
I'm getting tons of errors on my service for this reason
What errors? Without logs, nobody can advise.
This one is my reported error, and I didn't post it because I didn't feel it was generalizable
It feels like it's because it reported an error in the execution error, and the comfyui service had stopped by the time the results were fetched
Looks like you are trying to connect to the ComfyUI API before its ready to start taking requests
You need to do something like this to wait for the service to become ready before sending requests:
https://github.com/ashleykleynhans/runpod-worker-comfyui/blob/main/rp_handler.py#L33-L49
GitHub
runpod-worker-comfyui/rp_handler.py at main · ashleykleynhans/runpo...
RunPod Serverless Worker for the ComfyUI Stable Diffusion API - ashleykleynhans/runpod-worker-comfyui
def check_server(url, retries=50, delay=500):
"""
Check if a server is reachable via HTTP GET request
Args:
- url (str): The URL to check
- retries (int, optional): The number of times to attempt connecting to the server. Default is 50
- delay (int, optional): The time in milliseconds to wait between retries. Default is 500
Returns:
bool: True if the server is reachable within the given number of retries, otherwise False
"""
for i in range(retries):
try:
response = requests.get(url)
# If the response status code is 200, the server is up and running
if response.status_code == 200:
print(f"runpod-worker-comfy - API is reachable")
return True
except requests.RequestException as e:
# If an exception occurs, the server may not be ready
pass
# Wait for the specified delay before retrying
time.sleep(delay / 1000)
print(
f"runpod-worker-comfy - Failed to connect to server at {url} after {retries} attempts."
)
return False
I did wait for the service to start before executing the

Here's the log at the time of one of the errors, it looks like the comfyui service was started, but with an exception
This is the log of normal operation

It looks like Comfyui's service was shut down during the execution.
Does runpod severless automatically close ports?
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
There is no error message here, but the port is closed, and an error in the comfyui execution does not cause the comfyui port to be closed
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Here's another error log, looks like the port may have been closed at an arbitrary time

Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
What information do I need to provide
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
comfyui port 8188
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
This log looks like another kind of error, let's finish looking at the above one first

Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
I don't need it to be public, I just need it to be internal.
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
No, my logs are due to a port closure, which triggers a request to port 8188 to be rejected, and then an error
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
No, inside the logs comfyui's workflow are up and running.
check_server(
f"http://{COMFY_HOST}",
COMFY_API_AVAILABLE_MAX_RETRIES,
COMFY_API_AVAILABLE_INTERVAL_MS,
)
resBucketFileName = new_predict_turn_video_style(input_file, uid, task_id, pre_workflow_api)
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
I don't think that's the problem at all
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
This problem comes up by accident, not always.
Why is that, I run workflow after the checkSever function, why would it not be fully ready?
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
There are also logs of workflow running in the logs, indicating that comfyui has finished starting up
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
As I said that's a different 1, we're going back to my initial couple log discussions
this log
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
this logs
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
yes
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Can you be more specific about this situation?
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View

Do you mean that currently after that check_sever of mine, the service may be shut down?
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Is this log for that reason
The log indicates that the service has been started
The second problem is also due to the first execution error, the port was closed, and subsequent requests came in because the service was not started, resulting in the check_sever failing.

I looked at all the error messages, 2 causes
1. Port was closed during execution
2. Since the port was closed last time, the next request came in and did not start the service, so it caused a continuous error
The second problem is caused by the first problem.
I would like to troubleshoot the 1st problem first
The first problem is definitely not caused by a check location error, it is obvious that the port is closed during execution, but a general error will not cause the comfyui port to be closed, so I would like to troubleshoot if the system behavior is
One possible reason I can think of is GPU OOM
Can you guys see the logs related to GPU OOM?
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Depends whether ComfyUI logs it or not, I don't think RunPod has access to logs that you don't log yourself, and all your logs should be available under the logs tab for your endpoint, you shouldn't need RunPod to check the logs for you.
GPU OOM is a lack of system resources. Don't you have a separate record for that?
I've reproduced the problem, and it's due to GPU OOM.
@digigoblin @nerdylive
Let's move on to the next question, what does the serveless environment look like after the old serverless task has finished running and a new task comes in? (without flash boot and activiy worker enabled)
Because I've found that it causes chaining problems here, where there was only 1 task that was causing OOM, but there were many
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Will the GPU be emptied, won't there be any effects from the last job?
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
So if I don't have an action in my code that will clear the GPU, it could be affected yes?
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
What can I do so that my serverless handles each request with a completely new environment? And not be affected by old requests.
Yes, and I will optimize the code to be more GPU efficient for different user inputs
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Actually, it's not just vram, but also comfyui sever, which I'd like to restart on every request, so I wish there was a way to handle it simply by just clearing the
How does this work exactly?
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
If you just used one of the tried and tested repos on Github instead of trying to roll your own, you would not have all these issues, for example:
https://github.com/ashleykleynhans/runpod-worker-comfyui/blob/main/rp_handler.py#L250-L256
GitHub
runpod-worker-comfyui/rp_handler.py at main · ashleykleynhans/runpo...
RunPod Serverless Worker for the ComfyUI Stable Diffusion API - ashleykleynhans/runpod-worker-comfyui
Okay, I'll take a look.
Because I have a lot of customization, not enough I should still learn a lot of the code here, thanks
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Okay, thank you both. Come back if you have any other questions later