No active workers after deploying New Release

Had 5 active workers. Deployed new release, which was quickly pulled. Shortly afterwards all workers went to "Initializing" state, fully shutting down the endpoint. Would expect some workers to stay active so the endpoint can handle requests. This is not the first time that this happened. As of now it is not stable to use this feature on production pods. @Papa Madiator
No description
29 Replies
Madiator2011
Madiator20113mo ago
Did you see active workers Set
pazanchick
pazanchick3mo ago
Afterwards ye, to see if it changes anything. Usually 0.
No description
Madiator2011
Madiator20113mo ago
If you click at worker you should be able to see if it’s for example pulling docker image
digigoblin
digigoblin3mo ago
Are you using version tags? This kind of behavior usually happens when you don't use proper version tags for your images.
pazanchick
pazanchick3mo ago
10/11 have new image. It has a version tag, although not sure what would be the proper one.
digigoblin
digigoblin3mo ago
And if you push to the same tag, this will also happen. Each deployment should have a different tag.
pazanchick
pazanchick3mo ago
The two images have two different tags yes
digigoblin
digigoblin3mo ago
Hmm, then this should not happen
Madiator2011
Madiator20113mo ago
I would set all workers to 0 wait for all of them to be deleted and then spawn new ones
pazanchick
pazanchick2mo ago
happens consistently. deploy new release -> all workers shutdown (effectively shutting down the endpoint for 15-20min) scary to use this feature for production @Papa Madiator @haris
No description
No description
pazanchick
pazanchick2mo ago
endpoint shutdown, workers active (billed for) and downloading the new image, queue is not being handled
No description
pazanchick
pazanchick2mo ago
this is really bad tbh
Madiator2011
Madiator20112mo ago
Another solution would be to manually kill X number of workers
pazanchick
pazanchick2mo ago
new release, again all workers shut down, can't deploy new images on same endpoint.
No description
digigoblin
digigoblin2mo ago
Did you use the same tag?
pazanchick
pazanchick2mo ago
nope
Tony!
Tony!2mo ago
I didn't even know this was a thing, was so used to the queue building up for every new Docker pull for 20 minutes 🤣
pazanchick
pazanchick2mo ago
@Papa Madiator Hi, this issue is still present, shuts down all workers. Is there a chance this will be looked into by devs, or is there a way to reach out otherwise? In support chat I was told this will be escalated (~2wks ago), no status update yet.
Madiator2011
Madiator20112mo ago
whats the ticket id?
pazanchick
pazanchick2mo ago
Didn't get such a thing
Alpay Ariyak
Alpay Ariyak2mo ago
Could you show the logs of an initializing worker click on the box of a worker, it will have a logs button of its own inside
guru
guru5w ago
i have the same problem (without any update to workers), everything stuck in initializing.. we're planning a marketing push for our app in the coming weeks. Sorta scary this happens in production :/ hmm, the GPU i had is now unavailable. so that explains it
haris
haris5w ago
Hi @pazanchick, would you be able to show me the configuration for your endpoint?
pazanchick
pazanchick5w ago
Hi @haris anything specific? Docker image ~90GB
No description
pazanchick
pazanchick5w ago
@Alpay Ariyak they usually log the (docker iamge) Download progress bar.
Issues that i encounter are either: - Worker is set active and billed for while downloading the image - All workers initalize at the same time, shutting down the endpoint
No description
No description
No description
No description
Alpay Ariyak
Alpay Ariyak4w ago
Could you show the logs next time it happens please
pazanchick
pazanchick4w ago
I guess it doesn't matter anymore. used new release feature. all workers start updating at same time, shutting down the endpoint. @Alpay Ariyak @Papa Madiator I'm not sure if mail would be the way to reach out to runpod, is there some other way? I tried via website chat and was told that it would be "escalated" maybe a month ago. No status update yet.
No description
No description
Madiator2011
Madiator20114w ago
do you have ticket id?
pazanchick
pazanchick4w ago
4474 (i guess. that number was in the mail, didn't mention it being a ticket id)