R
Runpod6mo ago
Barış

Image Generation Stuck Until New Requests Are Sent

It typically takes 5–10 seconds to generate an image. However, sometimes a request doesn’t enter the processing queue until another request is sent. This issue occurred multiple times today, and I recorded it. In the example I captured, my friend's first request stayed in the queue for an unusually long time. I asked him to send a second request to try and trigger the first one to start processing, but that didn’t work. When he submitted a third request, it finally caused the first request to begin processing, followed by the second, and then the third. This issue occurs frequently and significantly impacts the user experience, as requests that should complete in 5–10 seconds end up taking several minutes.
29 Replies
Barış
BarışOP6mo ago
Unknown User
Unknown User6mo ago
Message Not Public
Sign In & Join Server To View
Barış
BarışOP6mo ago
No, image generation is usually completed within 5–10 seconds. I recorded the video to show that requests only started processing after I triggered them by submitting a new one, which should not be necessary.
Unknown User
Unknown User6mo ago
Message Not Public
Sign In & Join Server To View
Barış
BarışOP6mo ago
I showed the workers at 0:44: I had 2 running workers, 1 idle worker, and 2 throttled workers This doesn’t happen all the time, but it pops up about once a week and definitely hurts the user experience. I even had a meeting with @Tim to try and show the issue live, it didn’t happen back then (which was a good thing haha) We agreed that if it happened again and I could catch it on video, that would be helpful. So that’s what I’ve done now, so I hope this helps the team figure out what’s going wrong
Unknown User
Unknown User6mo ago
Message Not Public
Sign In & Join Server To View
Barış
BarışOP6mo ago
That's right and I'm curious about why that happens. If 3 workers are running, I'd expect them to generate images separately but only 1 of them ended up processing the request
Barış
BarışOP6mo ago
I only generated the images shown in the video today, and here are today's logs. Thank you @Jason for your time and help
Unknown User
Unknown User6mo ago
Message Not Public
Sign In & Join Server To View
Barış
BarışOP6mo ago
I noticed another issue related to serverless, and Tim told me to make a thread in serverless about it, so that's what I did with this one too
Barış
BarışOP6mo ago
They are set like this
No description
NERDDISCO
NERDDISCO6mo ago
thanks for reporting this @Barış and thanks for your support already @Jason
curtis3204
curtis32045mo ago
I also encounter this issue lots of times! The request sometime stuck in the queue, and all workers still stay in idle, doesnt pop the queue to work. And this possibley be fixed when i send other requests to the queue, and make lots of request waiting in the queue, then the first worker start from idle to running
mux_in
mux_in5mo ago
@NERDDISCO Hi tim ,I'm muxin from X
Barış
BarışOP5mo ago
Hi, has anyone found a manual solution to this? My app is ready but because of this issue, sometimes I can generate an image, sometimes it's stuck/lost in the queue and never generates. Once this is fixed we can launch our app
Poddy
Poddy5mo ago
@Barış
Escalated To Zendesk
The thread has been escalated to Zendesk!
Ticket ID: #19112
Unknown User
Unknown User5mo ago
Message Not Public
Sign In & Join Server To View
Barış
BarışOP4mo ago
For those who have the same problem: I created a support ticket, tried different methods but it is not fixed yet. We will launch our app as soon as the problem is fixed.
deanQ
deanQ4mo ago
If you are just creating your endpoint or it’s initializing with no ready worker, do not submit a job yet. There’s a bug we’re tracking about this. At most, the job sits in queue for two minutes if it was queued before a single worker is ready. It also pushes past the queue if another job is queued after a worker is ready.
Barış
BarışOP4mo ago
Thank you for sharing that, Dean. Unfortunately, the issue occurs when workers are ready/idle. Additionally, I've noticed several times where workers kept running for over 8 minutes despite having no jobs in the queue or in progress, and created a new thread here: https://discord.com/channels/912829806415085598/1389378980959752273 Looking forward to launching our app as soon as these issues are fixed! 🙏 Hi, looks like I had to set a CUDA version to 12.6 or higher to fix the issue in the thread above Let me also share an update about this issue
Unknown User
Unknown User4mo ago
Message Not Public
Sign In & Join Server To View
Barış
BarışOP4mo ago
So the issue is, requests get stuck in queue, but they are generated when we send new requests. Because what we noticed is that sending new requests (either 1, 2 or sometimes more) trigger something, and it changes the status of our first request from being in queue to being in progress. and we noticed that, clicking the "generate" button many times always worked to generate images, because it triggered what was wrong
Unknown User
Unknown User4mo ago
Message Not Public
Sign In & Join Server To View
Barış
BarışOP4mo ago
to fix this, we updated our app, so that clicking the "generate" button sends the same request 5 times to RunPod, and the app checks each of their status quickly, and as soon as 1 of the 5 requests get in progress, we cancel the duplicate requests immediately. this way the probability of generating images is much higher and it does not generate any duplicate image this is a temporary solution we found for the issue, but I hope RunPod devs can fix it so there's no need to do this yep, same as in the first 2 messages of the thread
Unknown User
Unknown User4mo ago
Message Not Public
Sign In & Join Server To View
Barış
BarışOP4mo ago
haven't tried cloning the existing one before. tbh since we lost many credits and found a temporary fix now, I don't want to risk losing credits again if the issue happens on a new clone yes, around 2 weeks ago
Unknown User
Unknown User4mo ago
Message Not Public
Sign In & Join Server To View
Barış
BarışOP4mo ago
we've been in touch almost everyday and they've been helpful, I really appreciate their time and help. the last thing they said was that they noticed my setup was using SDK version 1.7.9, there have been improvements and bug fixes since then, so they recommended upgrading it to 1.7.12 thanks to Tim, I've updated it today but haven't updated my app to use the old method (sending 1 request at a time rather than 5) yet. I will keep you posted if I notice upgrading the SDK fixes the issue
Barış
BarışOP4mo ago
Hi! I've some great news, the issue has been fixed for me since I selected a CUDA version 12.6 or higher in "Endpoint Settings > Advanced > Allowed CUDA Versions" 🙂 Can't believe it was this easy but everything works well since I selected it
No description

Did you find this page helpful?