Slow startup times
Has anyone experienced really variable startup times? Loading comfyui today took 45+ minutes when it usually takes 1-2 minutes.
Also, jupyterlab in general has been laggy / not responsive. Yesterday was working just fine, so not sure if that's just me.
187 Replies
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
EU-RO-1
@crystal not just you, I am also experiencing exactly the same issues with EU-RO-1, just joined here to see if anyone else was having trouble
Jupyter very laggy and keeps 'sticking,' echoing back some seconds later, everything seems very slow/unusable when usually its very responsive. I also noticed the pod local storage get to 107% at one stage, despite the fact that I don't do anything outside of /workspace. As usual I am running runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04 with network storage, never usually have any issues with it.
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
@crystal
Escalated To Zendesk
The thread has been escalated to Zendesk!
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
same here
same, EU-RO-1 as well
disk I/O seems to be extremely slow i think?
python3 -m venv venv took about a minute to finish
and libraries are taking forever to load:
Same for us β running multiple pods with network storage on EU-RO-1 and they are extremely slow. ComfyUI takes 10 to 30 minutes to start and we constantly get cloudfare timeouts, and it's the instance is pretty much inaccessible.
I've opened a ticket https://contact.runpod.io/hc/en-us/requests/19551 yesterday and looking to see if I can provide more relevant info.
Hope this gets solved! π€
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
Facing the same issue
The container volumes work at expected speeds though, it seems that it's only network volume related
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
π
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
happy to check for you
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
yea!!!
ComfyUI is pretty much unusuable especially if you have a large install with multiple custom nodes
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
too much slow on RTX 4000 Ada EU-RO-1 even unable to start comfyui
It seems that this is affecting serverless workers using a different network storage on EU-RO-1
Our serverless workers are no longer starting up due to timeouts
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
That would mean moving around 1 TB of data from different network volumes and reinstalling multiple containers
i am using a permanently mounted dis, but it stucks

How is it possible that nothing happen, it's been almost 24H
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
I don't think Runpod is aware of this since no incident has been logged:
https://uptime.runpod.io/
But there's clearly an issue here
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
I'd be happy to provide more info if needed
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
Anything that's needed by staff for debugging on our side
is issue still going on, can someone provide me some pod ids, I will check with infra team.
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
@yhlong00000
here is mine:
zafzxwm66rcvy8
Same issue
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
Yes, having issues with: 4n17vfvlfgxuwj
Also, serverless worker: 1kxd62zyws4zq0
The pod 4n17vfvlfgxuwj took around 20 minutes to boot ComfyUI from /workspace/ComfyUI but I could not connect to it after it booted
Trying to deploy another one to see how it behaves
i've just run a speedtest on the machine, the network is good. can you give me a screenshot what you doing is slow?

For e.g. serverless worker n85qs8g9swe0ed is currently trying to initialize comfyUI from /workspace/ComfyUI (network storage)
same here, my comfyui backend is loading forever and not booting
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
Again, the issue is with network volumes
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
I use storage network
network volume speed is slow?
Yes!!
π
got it, let me run some test
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
ON MY END network seems good but its not booting comfyui taking too much time to boot seems like issue is in volume!

I'm connected to two separate teams running services on different network volumes on EU-RO-1 and they both fail
both normal pods and serverless workers
the speed on the temp volume on the container works well
it's just the network volumes that are getting hit
e.g. pod: enzp9x316vp4yo
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
and serverless worker 1kxd62zyws4zq0 is currently stalling
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
Ran a test, seems a bit slower than normal. Iβve pinged the infra team to take a look, itβs the weekend so the response might be a bit slow. If you need a quicker workaround, you could temporarily switch to another region and copy the files over. I know itβs not ideal, but it might help for now. Appreciate your patience!
We're actually running a live production which needs the serverless to work in the next hour. Transfering 400 GB between regions is pretty much impossible
Will try to find a different solution
I feel for you man, it's bad timing.
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
Yep, thanks, but this would take around 3 days to complete
We'll re-route it to our local machines
Thanks for looking into this!
Fingers crossed that someone will look into it soooooon π€
The network speed actually looks pretty solid, 400GB should finish transferring in about an hour.
Yea, but the network volume that I'm transferring from is the one affected
I get less than 100Mb/s
And it's patchy
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
yeah, I am getting about 180-190 Mib/s
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
Exactly, but it goes up and down!! π€
I've benchmarked with both small and large files:
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
Different results: 50, 100, 180
I tried it multiple times and it goes up and down, but never more than 180-190Mb/s
This is what I see until I get timed out while trying to run Comfy with a Network Volume. A Romanian 5090. Will this be resolved?

Same here
ComfyUI takes super long to load, and then if it loads, you cannot connect to it
can u try ssh in to network volume
i am stuck and very lag when using it
same here, dismal load times
hihi, can u help me to do a test?
want to confrim our problem is same or not
u are also using network volume?
same problem here
yep, using a volume network
hi still here?
two things u can help me
1. can u ssh into pod, and cd
/workspace, try run some linux command, see if it is lag or not?
2. can u try to make a folder with many files in root, and then try to copy them to /workspace with cp -rv, and see if the copying is lag after copy about some files-> normal-> lag again?
if our issue is same then i think we need to tag the admin to notice this issueWe have been tagging everyone and opening tickets since yesterday, but no real intervention yet
raised this internally, we'll be looking into it, no eta on a fix
We had to move our infrastructure to a different provider since EU-RO-1 network volumes do not work properly
Cool!
I wish this was reported as an incident to be able to track it correctly from our side
And to stop losing cash on booting up pods and serverless workers

Luckly we noticed these workers running and shut them down manually....
i.e. the requests were triggering workers that were timing out
How. I hope it isnt some long ass thing we have to do. It should be one click transfer but I can tell with this brand it wont be.
Horrible service. Horrible.
Do we have to pay to have pods up to do this transfer?
What did you end up with?
Running most operations locally and other providers now until the problem is fixed...
We tried to do a transfer to EUR-IS-1, but it would take days since the network volume simply cannot transfer fast enough
This affects multiple teams and projects on our side unfortunately
Yes
:(
glad i'm not the only one whose work relies on this
There should be zero charge to swich locations. I assumed this was an established company. I no longer feel comfortable referring this service.
why would you charge someone to change locations esp when one does not have the GPU or network?!?!
The same issue is still occurring on EU-RO-1. Since this is incurring additional charges, we would greatly appreciate your prompt assistance.
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
Hey all - apologies for the delay, the team was able to track down the congestion on EU-RO-1βs storage cluster and resolved it at 00:37 UTC. Weβve been monitoring for the past hour - at this time, performance should be restored to normal levels.
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
Its still not working @brennen_runpod
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
Page not found
@brennen_runpod It still doesn't work properly on our side
This is the average speed I get
ComfyUI still takes a long time to load all modules and I cannot connect to the interface at all...
Is the team still on this rn?
As a note, this particular network volume is 540GB β another volume that's considerably smaller that belongs to another team I'm part of works
My result on 1GB write is much faster but it is still slow to start comfyu. Moreover, it would throw 524 error randomly.
@Ben Lau ComfyUI imports many small files (most of them only a few kb), so I feel that it's more relevant to test with a smaller block size
This was not a problem a few days ago BTW
Let's try for small files
I encountered a similar problem starting on 26th June, but it became more severe yesterday.
Same for me, it's still terribly slow
Problem is not just that, I lost my $10 balance in all this mess @brennen_runpod
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
The loading is one thing, but the overall slowness is a big issue: ComfyUI doesn't load in the browser, and if it does,models load very slow, files cannot be uploaded or downloaded etc.
Working better for me but still not entirely right, Jupyter is still a bit laggy
Hey runpod team, are you gonna fix this or not??
I spoke to soon, just moved from an A2000 to 4090 and its very slow to load. Burning through credit here with no output!
The 4090 pod is giving me:
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 4.98087 s, 216 MB/s
I had about 600 MB/s on the A2000
As others have said, its just the network storage, for root I get
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.61319 s, 666 MB/s
Please let us know an ETA to get this resolved. If you are not going to sort this soon then I want to delete my network volume, as I am currently paying for 450GB that I can't use.
I even cant enter Jupiter lab or fluxgym on my pods Ρ
_Ρ
i lost 20 min for waiting
Iv been unable to use runpod the entire weekend. My entire weekend is just wasted not being able to get a second of work done. First time user experience isnt that great i tell ya
Exactly same here, entire weekend wasted, no work done, also lost $10 worth balance in all this stupid mess
Is it working for you guys? Still very slow here...
Still slow for us too
It seems to work better every now and then, but it's super variable and unreliable
For e.g. we're struggling to upload a 6MB mp4 for 10 mins now
nope
waiting for runpod team resolve the problem
can someone who are using the service raise a support ticket through email?
Finally saw this thread. I spent all day trying to move to a new network storage π€¦π»ββοΈ A heads up from the team would've been useful.
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
?
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
what exacly slow speeds uploads/download, from where local, remote?
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
This sounds like you folks are not even bothering at this point. Sorry, but it's really frustrating to move infrastructure during the weekend to boot up our productions here.
Currently spent a lot of cash to make things work and it's impossible
It's just me as I'm OOO till tomorrow.
Though even as support can't do much as it's infrastructure and reliability team responsibility.
Just to make it clear: the problem started friday and it affects the EU-RO-1 network volumes
We're currently experiencing i/o speeds between 10 and 180 MB/s
I mean EU-RO-1 is often heavy used mostly cause CPU pods
So you're recommendation is to...?
usually would say change region or submit ticket so we can forward it to the team
There are multiple of us that opened multiple tickets from different teams / accounts
Please read the thread starting from above
One of your team members acknowledged the issue and then they said it was fixed
discord is not main support platform though
see this
That is why I personally submitted a ticket friday afternoon CET
To move to another provider. Imagine being literally down for the whole weekend, of all days
We've booted instances locally and with other providers. The question is if this will be taken care of or not
https://contact.runpod.io/hc/en-us/requests/19551?page=1
I do not have now access to work device so I'm unable to check
Well, this is unproductive then, or? π
Sorry, but you're the only Runpod rep online now
I mean I will be checking on the Monday but my friend works on Weekend tickets.
I'm only tech support, issues like drives slow downs need to go to eng team as I do not have high level access.
Alright! Don't mean to throw blame, sorry, I know it's not your personal fault, but we need someone from Runpod to communicate and provide support even during the weekends because this is affecting our projects
Im spending 400$ per month on Runpod and all we get when its down for 3 days is.. nothing actually
I mean all are valid things.
Please check other regions too when having the time. I had the issue with 4 pods on 2 regions since yesterday.
As someone mentionned before, the pod startup time is one issue, but the biggest I see is regarding performance (the double ! ), last the laggy Jupyter.
Ex: I'm training Flux, Before I had 4s/it, from yesterday it's 8s/it !
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
let me guess Fluxgym?
Just Flux dev. it was EU ro and is as I remember.
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
But cannot check as I've deleted the pods.
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
OneTrainer with CLI (Onetrainer CLI 1.1 is the template name)
Do you guys have any recommendations on what region to move to? I don't want to end up in another one with issues.
And that was with RTX5090, secure cloud but guess the issue is general with any GPU.
We had the issue with A100 PCIe, A100 SXM, RTX PRO 6000, 5090, 4090 (this on serverless) so it's definitely GPU independent imo
ok did what I could do and send message on internal chat.
Cool, thanks!
also tried myself and also seeing it

I wish there was a strategy to run without the network volume β this would save a lot of headaches β but for us it would be impossible to manage the python venv updates via image. And the small python modules are definitely the i/o bottleneck here
Im kinda curious why all these big tech/ai companies who get most traffic on weekends when people are free have all their staff off lol.
Civitai too. Site goes to hell every friday to mondayπ€£ every staff is off. Makes no sense to me
I have huge hopes for S3 API
But aren't the EU-RO-1 deployed on S3 too?
they are test region
π
is this the reason why things are not working, then?
nope dont think so
but S3 API would help to move data between regions
For info and if I remember correctly:
Yesterday pod could start, slow but started. But training time was the double of usual, really the double ! it Was on EU IS.
Today on EU RO, I had to cancel the pod setup after waiting 10 minutes, it usually take just one minute.
So it seems that some regions are slower than others but the problem is general, all GPU and regions.
Note if this can help: pod deployement through SSH, on demand plan and not using network volume.


Just getting this constantly
Different pods on EU-RO-1
Every 10-15 mins
Problem should be solved pls check
It still doesn't work
now a bunch of serverless workers started to fail again
have to switch to back to local
We get a lot of file not found errors, as if the network volume keeps disconnecting:
Really a nightmare tbh. We'll probably switch to another provider completely next week. Can't justify this to the folks that are relying on our productions
And now the serverless workers are just eating through funds like crazy
Why you use so old version of sdk?
Same issue. Spent like 2 hours trying to get something to run in EU-RO-1 - it's just not working. Looks like storage issues again...
Now working for me also. I am able to start Comfy in the terminal, but its not getting loaded in the new tab, it just shows a loading animation and after some time it throws an error
(RO network volume)
Tried deploy new pod?
You might want share comfy logs
Still pretty slow for me on startup. Also ComfyUI has been stuck in stuck in loading once it does start.
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
So is it working right now??
It is much better than yesterday.
@Jason @brennen_runpod @Elder Papa Madiator
It works! Thank you for your effort to resolve the issue.
The load times seem significantly faster here too. Will report back as the teams are starting their days.
Thank you!
It works here too! Thanks a lot @brennen_runpod
Same on EU RO 1, it came back to the normal figures for performance.
Same problem again, Comfy loads forever. Worked just 10 minutes ago


I had the same issue a few minutes ago. It does feel like either the issue is still there, or that now it's a different, network-related issue, that wasn't observed earlier.
The network volume speed seems really good now, but the HTTP loading time is still taking a cap every now and then, which also leads to some components of the ComfyUI interface not loading (e.g. css files etc.)
Our manual fix is to see what's not loading using browser dev tools and reload those items individually, then refresh the main interface
yeah I am not a dev so I have no idea how to do all that, I just need to generate some images man
@Elder Papa Madiator
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View

incognito tab
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
I am also suffering from VERY slow starts on EU-RO-1 still. I migrated my network storage to US-TX-3, which loads perfectly fine (except it quickly runs out of pods lol).
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
Yeah, I start it on the console and it takes over 10 minutes to get running. Just a few seconds on the other region.
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
how do you migrate a volume? mine is like 300gbs
Also wanna know
well, I have this issue since the day one. Somehow I have accepted that as a
normal behaviour.. just wait a few minutes to get it loaded.Is anyone seeing slow startup times again or is that just me? Have been trying to launch for the past few hours
Unknown Userβ’4mo ago
Message Not Public
Sign In & Join Server To View
It's EU-RO-1 in comfyui