Runpod unusable

I've tried spinning up several pods, but the I/O is so slow, I can't run anything. ComfyUI takes 10 minutes to load the UI. Nothing is working.
86 Replies
papagaio do statusquo
same here have tried with multiple gpu types and runpod comfyui images. all of them either just blank page or it takes minutes to load a laggy ui. something is going on
Quantum132
Quantum1323w ago
yup NC-1 is SLOW i, i assume its not just that datacenter tho by tyhe sounds of it
papagaio do statusquo
im on EU-RO-1 the runpod itself seems to download things at an appropriate pace, its just the reverse proxy being wonky i think
Quantum132
Quantum1323w ago
my UI is slow, the gen is slow, and CPU tasks seem mega slow. like deleting 150 imgs has been takling me 10 minutes
papagaio do statusquo
yeah there's something going on
Quantum132
Quantum1323w ago
100%. this is bad lol. this is useing good hardware. some major issue going on.
Aidude
Aidude3w ago
i need to download 3gb images and its downloading at 220kb per sec. frustrated right now.
papagaio do statusquo
wish i could get my credits refunded lmao
JamieMacd
JamieMacd3w ago
yep seeing the same thing. thinking it's time to bite the bullet and try lambda or similar.
Unknown User
Unknown User3w ago
Message Not Public
Sign In & Join Server To View
papagaio do statusquo
I did yesterday Problems continue today. Had luck and got an instance running, but if i restart comfy for example its a matter of luck if it's going to load or not
Unknown User
Unknown User3w ago
Message Not Public
Sign In & Join Server To View
papagaio do statusquo
im pretty sure im on eu-ro-1 my pod id that im running rn is 0xgnu6t4h9j3lv i managed to get it run due to luck but again, its not a deterministic thing sometimes a pod just doesn't let you connect to the webservices or it takes minutes to load ssh and everything else seems fine which is why i think its a reverse proxy issue
Unknown User
Unknown User3w ago
Message Not Public
Sign In & Join Server To View
papagaio do statusquo
just answered the ticket too
Madiator2011
Madiator20113w ago
eu-ro-1 is always slowest as it's heavy used
ProGamerGov
ProGamerGov3w ago
are they fixing the proxy issues?
papagaio do statusquo
They don't think it's an issue with them. Trying to get me to run other pods and images and share logs with them I just can't be bothered right now to be debugging this for them while using my credits There's also not much to debug. The images run and are up. The issue is accessing the open ports.
Fleetwood
Fleetwood3w ago
Also seeing the same on EUR-IS-1, unfathomably slow at CPU operations Trying to split a 16GB file into train and test taking in excess of 15 minutes, it should be 15 seconds
ProGamerGov
ProGamerGov3w ago
The pods I spun up had extremely fast internet connections, but anything over the proxy was so slow that it was like 1mb a minute
Aidude
Aidude3w ago
\im trying to download some generated stuff and the speed is 200kbps. this is crazy. This has been happening since yesterday
Fleetwood
Fleetwood3w ago
moved to lambda labs, already training
Meister Sean
Meister Sean3w ago
Not sure if this will help but, has anyone tried a VPN? My pods run fine but connecting to any webUI took FOREVER, then uploads/downloads were absurdly slow (around 100kbs). Assumed it was MY connection/route to the pod, so I used a VPN to go through the US, and suddenly things loaded/uploaded/downloaded at a decent speed. There's definitely something wrong between me and runpod's services because everything was fine last week.
Aidude
Aidude3w ago
Thanks for the tip bro. ill give it a try now. oh man thank u. It sort of worked. It went from 220kbps to like 2mbps. Thats win at this stage. Thank u
SleepWalker
SleepWalker3w ago
The "EU-CZ-1" is likewise insanely slow. I made an inquiry, but since yesterday's reply, "we did indeed find evidence of high packet loss in the EU-CZ-1 datacenter.", I have not heard of any progress.
Aidude
Aidude3w ago
i opened a ticket as well. no reply so far. Im in US NC1. Its really frustrating
Kayline_ai
Kayline_ai3w ago
It is indeed slow...slowest it has ever been
Meister Sean
Meister Sean3w ago
Mate, that's great! I'm not getting amazing speeds either, they are 'decent' but the main thing is it's usable now. Couldn't even download an image before.
That Lamer
That Lamer3w ago
bad time to delete and remake my pod, it's been installing collected packages for half an hour. I dont remember it taking this long the first time around.
Olbanets
Olbanets3w ago
Guys, I faced such an issue the 3rd time 🙁
Aidude
Aidude3w ago
Yes bro. Thats true. Another way i just figured out is by uploading the images to a dataset in huggingface and then downloading it from there. It gets uploaded within like 5sec. 3Gb. then the download is superfast
Dj
Dj3w ago
There's a lot to unpack here, going to work backwards.
Dj
Dj3w ago
These types of network events are rare and during the incident period there's not much we can do about it realistically. If it's sustained there's a lot we can do, but we really have no mode of recourse for random spikes. If we're responding that we see the issue, we do see the issue and if it can be remedied I guarantee support will trigger an on-call alert (which I receive as well!) (This is not the spike that you saw, but showing what support sees in incidents like this)
No description
Dj
Dj3w ago
Connectivity, particularly to Runpod, is a rather complicated beast. Some providers are just... worse than others and while naturally we would like that to not be the case it can be hard to tell you "sorry it's their fault", because even if you complained to that service provider (i.e. HF, GitHub's Container Registry) it's usually some issue outside of their hands or a more fundamental issue than they may want to invest in. Most of the complaints I see here have to do with the Runpod Proxy, the Proxy is a network of servers we have deployed in France, Amsterdam, New York, and San Francisco and they... do have their own issues - but while I do see a small uptick in usage it doesn't correlate with the report here.
ProGamerGov
ProGamerGov2w ago
I found the issue was present in multiple EU and North American regions
papagaio do statusquo
Thanks for your response. It seems it's still random. Here's an image and network volume that I managed to make work yesterday after trying to start multiple pods and terminating them. Here's the experience. Jupyterlabs this time has loaded quicker, but ComfyUI hasn't. I can confirm through the logs that comfyUi is running perfectly
papagaio do statusquo
its weird because its random. depending on the pod it'll load extremely slowly - it'll be on that blank page for a few minutes, then the comfy background appears, still loads for a few more minutes then it's usable just a few days ago everything was working smoothly and this is not a comfyui or runpod set up error datacenter EU-RO-1
SleepWalker
SleepWalker2w ago
Thanks for the reply. Am I correct in assuming that the EU-CZ-1 is still impaired? Is there any prospect at all for me to decide whether to continue using RunPod or not? Also, is the temporary solution by using VNP mentioned above effective as a countermeasure?
Dj
Dj2w ago
To my knowledge there is nothing wrong with CZ-1. But, in a pinch yes the VPN method does work if you have access to one. Otherwise just let support know and they can help get the proxy service tended to.
Aidude
Aidude2w ago
its not a one to one solution but it doubles the speed. still hard to download big files but u can dwonload through huggingface.
kubachris
kubachrisOP2w ago
I'm on EU-RO-1. I've not had any problems using this datacenter for at least 6 months now. Now in the past week, when I connect via proxy, I get download speeds of about 30kb/s, and it's unusable for anything. Are you saying this is not a Runpod issue, so there is nothing to be done?
Madiator2011
Madiator20112w ago
why not switch region then?
papagaio do statusquo
network volumes if you guys let us switch network volumes location then that'd be great
Madiator2011
Madiator20112w ago
you can create new storage and transfer data then delete old storage
Olbanets
Olbanets2w ago
in some cases it takes years 🙁
Madiator2011
Madiator20112w ago
how do you transfer files?
Olbanets
Olbanets2w ago
I tried rsync, as from your doc
Madiator2011
Madiator20112w ago
are you transfering directly from pod to pod?
Olbanets
Olbanets2w ago
yes, I tried and I stopped when I saw the estimated time
Madiator2011
Madiator20112w ago
how many data? How many files?
Olbanets
Olbanets2w ago
The volume itself is about 0.5Tb. The dataset is about 120k files and its cache is about x3 but smaller sizes
Madiator2011
Madiator20112w ago
if it's lot of small files I would probably tar them
Olbanets
Olbanets2w ago
https://contact.runpod.io/hc/en-us/requests/24625 now it looks like resolved again. For future I decided to terminate the POD, file a new ticket and wait for the magic.
Quantum132
Quantum1322w ago
hey so anyone have a solutionn for the incredibly slow upload/download speeds to runpod servers? or is it all on their side, some big cyber attacxk or datacenter fire?
WorldObserver
WorldObserver2w ago
I believe the issue is linked to this, since runpod uses cloudflare tunnels with custom domains: https://www.cloudflarestatus.com/
Cloudflare Status
Welcome to Cloudflare's home for real-time and historical data on system performance.
Quantum132
Quantum1322w ago
interesting
kubachris
kubachrisOP2w ago
Every region has a different lineup and number of GPUs. This is how I am deciding to use a region in the first place.
Unknown User
Unknown User2w ago
Message Not Public
Sign In & Join Server To View
kubachris
kubachrisOP2w ago
"Just change region" doesn't seem to me to be the right answer. There are obviously a lot of people here experiencing issues that started recently, but all we are getting so far is "not our problem" kind of responses. Even if the issue is external, it would be nice to know that it has/is being investigated and there is a proposed workaround.
SleepWalker
SleepWalker2w ago
Runpod is proposing a solution to this issue by changing the storage, but this requires transferring files between storages. This process is costing users time and money. I don't think this is a healthy situation. I'm stuck watching my storage bill go down.
moss
moss2w ago
Where is that issue being discussed? I doubt that storage is really the problem — it feels more like a network issue to me.
Quantum132
Quantum1322w ago
This is not a storage issue. It can't be Lol. You don't go from like 50/100mbs down to 300kbs because of a full disk. Plus disks are cheap. Hello this is 2025 not 1985, just spool up a new set of drives if the server is low on space. It has to be some network issue.
Olbanets
Olbanets2w ago
I guess that their storages are also a kind of network-mounted. Network issues cause bad storage experience.
Quantum132
Quantum1322w ago
Hmmm I guess. But that's like bad bad. Like 1992 level speeds lol
LebaneseNinja
LebaneseNinja2w ago
Are they going to communicate anything about this? Or have they somewhere? Comfy is slow af and is actually not usable, meanwhile im still getting charged for storage.
alluring_fox_47881
Wtf is happening??. I can't deploy with any gpu regions. Vpn non vpn
Rguedes
Rguedes2w ago
RunPod down bro, since yesterday
papagaio do statusquo
Yeah I unfortunately lost quite a few bucks trying to debug it before realizing the error was not on my end
LebaneseNinja
LebaneseNinja2w ago
@Dj any update on this? Is the team still investigating?
trillagodmode
trillagodmode2w ago
I went TCP for all my ports and everything's working better and faster than ever before. Something's up with runpod http proxy and it's been that way since I started using it in June but has gotten extra bad the past few days
max4c
max4c2w ago
Hey we see these messages! @Dj can you help me understand if there is anything in particular we can do to help them out?
Dj
Dj2w ago
Me again, again a lot to unpack but I'll try my best and just ping everyone. We've confirmed there was an issue with the Runpod Proxy, but I don't know the details/particulars of this issue and how deep rooted the problem is. That can be found here. You'll notice TCP is faster as it skips the HTTP proxy, but may not work for every service or may not be the solution you want. I think for now I can recommend it over the Proxy way more confidently though. We also recognize the Network Volumes can feel slow and especially terrible during peak times and we aim to solve this as well with our High Performance storage feature. While not yet available to users, I can show you this. It can be hard to respond to frustrations like this, my concern is the only thing you hear from me being "we're working on it" again and again. It's my job to tell you the things we're working on, where they stand and most importantly relate to you and your issues.
No description
Dj
Dj2w ago
If you have any more technical questions about the design or infrastructure related to things like Network Storage or the Proxy please feel free to ask here or in DMs. I'll share everything I know and am allowed to disclose (I can give you general numbers but I can't speak to exact datacenters, etc.) @kubachris, @SleepWalker, @moss, @LebaneseNinja @Quantum132 The least amount of available space we show in one datacenter is 50TB, it's not a storage capacity problem. @papagaio do statusquo I've messaged you a $25 credit code for your time lost. I offer this to anyone else who can articulate their recent problems as well while I work to get it prioritized and understood. - Dj
Olbanets
Olbanets2w ago
"We also recognize the Network Volumes can feel slow" oh, finally!
papagaio do statusquo
Thanks for your message and finding out about the issue. Where have you sent the code? I haven't received anything on the usual suspects. You can send it to me on a discord DM
Olbanets
Olbanets2w ago
hope I won't file more ticket about that
Dj
Dj2w ago
Discord has "Message Requests" so you wouldn't have gotten a notification for my message. You can click this: https://discord.com/channels/@me/1426290401005797418/1426290675196104705
papagaio do statusquo
Oh yeah just found it Thx
kubachris
kubachrisOP2w ago
For me this issue appears to have been fixed. Thanks.
Kayline_ai
Kayline_ai2w ago
Runpod has been incredibly slow lately for a couple of days now That would take lots of time to do... everything was fine before not sure why all of a sudden these issues.
Unknown User
Unknown User2w ago
Message Not Public
Sign In & Join Server To View
Kayline_ai
Kayline_ai2w ago
Running Comfyui...jupyter is incredibly slow sometimes just a white page. Tried firing up comfyui and just got stuck. Things working again...ups and downs riding the wave 😆
Unknown User
Unknown User2w ago
Message Not Public
Sign In & Join Server To View
Harmeet Gabha
Harmeet Gabha2w ago
clear cache or try different browser? The TCP Port trick also works very well because you basically bypass Cloudflare which provides the reverse proxy.
miguelito
miguelito3d ago
Im still suffering a similar situation. Reading a list of audio filed from disk in my case and getting their lengths
Unknown User
Unknown User2d ago
Message Not Public
Sign In & Join Server To View

Did you find this page helpful?