Runpod•3mo ago

EUR-IS-1 extremely slow

From today, aug 13th, the EUR-IS-1 datacenter seems extremely slow. It was working fine yesterday. Today, using ComfyUI with my usual template, generation times are 10x slower, and I keep getting "Disconected" messages... anyone else facing the same troubles?

31 Replies

gufisha•3mo ago

Yes. This is painful, can't even get the template running.

Aron•3mo ago

same here.

gufisha•3mo ago

Yeah, I hope someone sees this as https://uptime.runpod.io/ there is nothing showing.

Runpod status

Welcome to Runpod status page for real-time and historical data on system performance.

Aron•3mo ago

Answer to my ticket: Thank you for the detailed report and for sharing the logs. We’re aware of an ongoing issue affecting network volumes in the EUR-IS-1 datacenter, which is causing slow read speeds and, in some cases, long startup times or unresponsive behavior in applications like ComfyUI. I’ll keep this ticket updated as soon as we have progress or a resolution to share. In the meantime, if you notice any change in performance positive or negative please let us know so we can include it in our investigation.

Dj•3mo ago

I'm glad support was made aware, I wasn't so I wasn't able to update the uptime page sorry :( We're reporting this is fine though, I just found their conversation.

CodingNinja•3mo ago

Ohh, this uptime needs to be updated manually on the website? No automated health checks as of now?🐧

Dj•3mo ago

not for the storage clusters they are sort of unpingable

Michael Chang•3mo ago

is the EUR-SI-1 data center shutting down?

Michael Chang•3mo ago

Michael Chang•3mo ago

this alert pop under the L40 pods

Dj•3mo ago

No, the owner of that machine intends on shutting it down. The rest of the DC is still available :)

gufisha•3mo ago

Yet again, issues with IS - 1

mitchken•2mo ago

These are being replaced by PRO6000 cards

Michael Chang•2mo ago

thanks for answering the loading time is so long it just times me out eventually

SUUUUIIIIIII•2mo ago

The same problem

extrems69•2mo ago

Same probleme here, So waste money, that's not faire

Michael Chang•2mo ago

if the EUR-IS-1 owner is a service provider that has signed contract with run pod official, i believe run pod should take action against EUR-IS-1 for its disappointing performance, like providing rotten meat to a restaurant, and causes customer sickness, the restaurant should take action before it becomes the restaurant's fault. i reported the issue in the feed back section, feel free to support my statement in order to engage runpod official's action

extrems69•2mo ago

I did

Garðar•2mo ago

I'm experiencing severe stalls on the network volumes on EUR-IS-1 which is probably connected to why you got long loading times. Processes get stuck in D-state (request_wait_answer / fuse_direct_IO). Started seeing this yesterday but was also a problem last month. Any I/O on /workspace hangs; shells become unresponsive (Ctrl-C/Z doesn't work). Local disk I/O is fine. Is something wrong with the moosefs setup?

AmirKerr•2mo ago

Most likely EU-SE-1 is getting shut down as well. Doesn't work for 2 days.

rasmus•2mo ago

Eu-Ro I had the same stalls and hangs What’s the workaround? Not use workspace?

tacle2•2mo ago

yeah same for me i did not seen any post from the team about this server and i'm not the only one complaining about that ... we should at least get our money back for this day .... I don't understand why he doesn't communicate about this, that way I wouldn't waste my time and therefore my money waiting for comfy to start, or wasting time understanding what is the cause of the problem when it's just the server

Dj•2mo ago

It seems like our infrastructure team is aware, but at this time we have no action items. We'll continue to monitor. We implemented a solution at <t:1756900020:f> If you have a support ticket open for this, please let me know. If you don't have a support ticket open message me your account email.

Michael Chang•2mo ago

the solution don't seem to be working, it is either stuck with the "the port is not up yet" for an entire hour (still on-going), or completely not running any workflow, unable to load even the checkpoint. We are suffering loss from all the delayed task waiting to be done with runpod service, or simply sitting there, paying runpod and wait for a miracle. i hate to say it but I don't see how this is not a fraud. looking forward to it all returning to normal

mitchken•2mo ago

@Michael Chang to provide some feedback on the EUR-IS-1 cluster used for network storage, none of the included nodes surpassed even 50% utilization compared to the capacity they have available for the last week. Let me know if we could be of any assistance however support tickets are the best option to get swift results I guess.

Michael Chang•2mo ago

its happening again, the pod is up, but i can not connect to it, it just gives me a blank screen, thought it was solved because the pass 1 to 2 days were working fine

nailonge•2mo ago

and again I think. everything was fine and after a second, all started to take FOREVER out of nowhere

TenofasOP•2mo ago

Yes, confirmed, I tested rtx5090 and rtx pro 6000... both are extremely slow and get stuck after few minutes.

gufisha•2mo ago

Can confirm as well..

jojje•2mo ago

Same issue. I tested both 5090 and 4090 nodes. Here's the most forgiving read pattern imaginable; sequential with no other I/O going on in the pod, and massive 1 MiB read blocks. Can't be any kinder to storage infrastructure than that. And still, performance not great. With standard 4k reads, it's without doubt unusable.

root@79b7da44cdc6:~# for f in $(find /opt/comfyui/models/ -name "*.safetensors");do echo $f; dd if=$f bs=1M of=/dev/null;done
/opt/comfyui/models/clip_vision/clip_vision_h.safetensors
1205+1 records in
1205+1 records out
1264219396 bytes (1.3 GB, 1.2 GiB) copied, 24.3915 s, 51.8 MB/s
/opt/comfyui/models/vae/wan_2.1_vae.safetensors
242+1 records in
242+1 records out
613561776 bytes (614 MB, 585 MiB) copied, 9.68587 s, 63.3 MB/s
/opt/comfyui/models/loras/Wan2.2-Lightning_T2V-v1.1-A14B-4steps-lora_HIGH_fp16.safetensors
585+1 records in
585+1 records out
613561776 bytes (614 MB, 585 MiB) copied, 14.8218 s, 41.4 MB/s
/opt/comfyui/models/loras/Wan2.2-Lightning_I2V-A14B-4steps-lora_LOW_fp16.safetensors
^C50+0 records in
49+0 records out

root@79b7da44cdc6:~# for f in $(find /opt/comfyui/models/ -name "*.safetensors");do echo $f; dd if=$f bs=1M of=/dev/null;done
/opt/comfyui/models/clip_vision/clip_vision_h.safetensors
1205+1 records in
1205+1 records out
1264219396 bytes (1.3 GB, 1.2 GiB) copied, 24.3915 s, 51.8 MB/s
/opt/comfyui/models/vae/wan_2.1_vae.safetensors
242+1 records in
242+1 records out
613561776 bytes (614 MB, 585 MiB) copied, 9.68587 s, 63.3 MB/s
/opt/comfyui/models/loras/Wan2.2-Lightning_T2V-v1.1-A14B-4steps-lora_HIGH_fp16.safetensors
585+1 records in
585+1 records out
613561776 bytes (614 MB, 585 MiB) copied, 14.8218 s, 41.4 MB/s
/opt/comfyui/models/loras/Wan2.2-Lightning_I2V-A14B-4steps-lora_LOW_fp16.safetensors
^C50+0 records in
49+0 records out

I had a tiny sqlite DB on a network volume; the comfy update-manager db. This file gets read and written two a couple of hundred times whenever one opens the install view. It's just a few bytes per I/O operation so barely any data. It took several minutes for that page to open up as a result. So the network issue is latency, not throughput. If someone at runpod is "monitoring", they ought to be looking at router packet loss and misconfiguration. It shouldn't be hard to find, as it's the path between the SAN devices and the servers. Just trace the paths node by node. If you needs specific pod IDs to trace to and from, just ask. I can offer some.

BlackWhiteAsian•2mo ago

Same on EUR-RO-1. 5090 and 4090. ComfyUI used to start in ~15 seconds. Now 5 minutes, if it actually starts. Can't even get to start making videos. Been third day like this. On and off. BTW, anyone having the slowing issue with serverless, or is it just a pod thing? Considering to switch to serverless workers if it is better there?

Gaming

Programming

EUR-IS-1 extremely slow

Did you find this page helpful?