EUR-IS-1 extremely slow
From today, aug 13th, the EUR-IS-1 datacenter seems extremely slow. It was working fine yesterday.
Today, using ComfyUI with my usual template, generation times are 10x slower, and I keep getting "Disconected" messages... anyone else facing the same troubles?
31 Replies
Yes.
This is painful, can't even get the template running.
same here.
Runpod status
Welcome to Runpod status page for real-time and historical data on system performance.
Answer to my ticket: Thank you for the detailed report and for sharing the logs.
We’re aware of an ongoing issue affecting network volumes in the EUR-IS-1 datacenter, which is causing slow read speeds and, in some cases, long startup times or unresponsive behavior in applications like ComfyUI.
I’ll keep this ticket updated as soon as we have progress or a resolution to share. In the meantime, if you notice any change in performance positive or negative please let us know so we can include it in our investigation.
I'm glad support was made aware, I wasn't so I wasn't able to update the uptime page sorry :(
We're reporting this is fine though, I just found their conversation.
Ohh, this uptime needs to be updated manually on the website? No automated health checks as of now?🐧
not for the storage clusters they are sort of unpingable
is the EUR-SI-1 data center shutting down?

this alert pop under the L40 pods
No, the owner of that machine intends on shutting it down. The rest of the DC is still available :)
Yet again, issues with IS - 1
These are being replaced by PRO6000 cards
thanks for answering
the loading time is so long it just times me out eventually
The same problem
Same probleme here, So waste money, that's not faire
if the EUR-IS-1 owner is a service provider that has signed contract with run pod official, i believe run pod should take action against EUR-IS-1 for its disappointing performance, like providing rotten meat to a restaurant, and causes customer sickness, the restaurant should take action before it becomes the restaurant's fault.
i reported the issue in the feed back section, feel free to support my statement in order to engage runpod official's action
I did
I'm experiencing severe stalls on the network volumes on EUR-IS-1 which is probably connected to why you got long loading times. Processes get stuck in D-state (
request_wait_answer / fuse_direct_IO). Started seeing this yesterday but was also a problem last month.
Any I/O on /workspace hangs; shells become unresponsive (Ctrl-C/Z doesn't work). Local disk I/O is fine.
Is something wrong with the moosefs setup?Most likely EU-SE-1 is getting shut down as well.
Doesn't work for 2 days.
Eu-Ro I had the same stalls and hangs
What’s the workaround? Not use workspace?
yeah same for me
i did not seen any post from the team about this server and i'm not the only one complaining about that ...
we should at least get our money back for this day ....
I don't understand why he doesn't communicate about this, that way I wouldn't waste my time and therefore my money waiting for comfy to start, or wasting time understanding what is the cause of the problem when it's just the server
It seems like our infrastructure team is aware, but at this time we have no action items. We'll continue to monitor.
We implemented a solution at <t:1756900020:f>
If you have a support ticket open for this, please let me know.
If you don't have a support ticket open message me your account email.
the solution don't seem to be working, it is either stuck with the "the port is not up yet" for an entire hour (still on-going), or completely not running any workflow, unable to load even the checkpoint.
We are suffering loss from all the delayed task waiting to be done with runpod service, or simply sitting there, paying runpod and wait for a miracle.
i hate to say it but I don't see how this is not a fraud.
looking forward to it all returning to normal
@Michael Chang to provide some feedback on the EUR-IS-1 cluster used for network storage, none of the included nodes surpassed even 50% utilization compared to the capacity they have available for the last week. Let me know if we could be of any assistance however support tickets are the best option to get swift results I guess.
its happening again, the pod is up, but i can not connect to it, it just gives me a blank screen, thought it was solved because the pass 1 to 2 days were working fine
and again I think. everything was fine and after a second, all started to take FOREVER out of nowhere
Yes, confirmed, I tested rtx5090 and rtx pro 6000... both are extremely slow and get stuck after few minutes.
Can confirm as well..
Same issue. I tested both 5090 and 4090 nodes.
Here's the most forgiving read pattern imaginable; sequential with no other I/O going on in the pod, and massive 1 MiB read blocks. Can't be any kinder to storage infrastructure than that. And still, performance not great. With standard 4k reads, it's without doubt unusable.
I had a tiny sqlite DB on a network volume; the comfy update-manager db. This file gets read and written two a couple of hundred times whenever one opens the install view. It's just a few bytes per I/O operation so barely any data. It took several minutes for that page to open up as a result. So the network issue is latency, not throughput. If someone at runpod is "monitoring", they ought to be looking at router packet loss and misconfiguration. It shouldn't be hard to find, as it's the path between the SAN devices and the servers. Just trace the paths node by node. If you needs specific pod IDs to trace to and from, just ask. I can offer some.
Same on EUR-RO-1. 5090 and 4090. ComfyUI used to start in ~15 seconds. Now 5 minutes, if it actually starts. Can't even get to start making videos. Been third day like this. On and off.
BTW, anyone having the slowing issue with serverless, or is it just a pod thing? Considering to switch to serverless workers if it is better there?