Is US-KS-2 dead?
Like 3 days a row pods on US-KS-2 working awful. Running on PyTorch 2.4, 2.8, doesn’t matter. Like 10 minutes to load Jupyter through 8888
19 Replies
This datacenter looks fine, can you tell me about the issue you're having or share a Pod ID I can take a look?
I'm having the same issue, several days in a row. Pod works but extremely slow
I cannot help without a pod or worker id in this datacenter, all I can say otherwise is this datacenter has no known incidents and looks fine looking at the combined metrics.
8hceyd0mo3l78z, for example, or bwwak89cnrnba5, just try to start both
That helps, one sec
This template in this datacenter works for me, Jupyter will only start if an environment variable called
JUPYTER_PASSWORD is defined when the Pod starts. I will note that I had to turn off my Adblocker after going to the Jupyter URL.
@Hleb J
I can't see if you have env variables defined for privacy reasons, but that's the only thing I can think of.Variable is defined. Problem is not just about Jupyter, problem with speed of loading data in datacenter. Start of 8hce… took 13 minutes to load template . Pod with EUR-IS-1, same template, works well, quick start, all folders accessible. But US-KS-2 in other tab says “the loading screen taking a long time…”. With web terminal folders accessible, but everything is slow. Like run comfy server on eur-is-1 takes less than 10 minutes, 5 more - and all models loaded, everything works perfectly. On US-KS-2, comfy with same nodes, near 40 minutes just to run comfy and after hour of waiting of models loading I’ve terminated pod. Sometimes it works normal, but then problem returns. And this problem definitely not with Adblock or internet, it’s internal problem of US-KS-2.
Yes, same here... it looks like it's only a US-KS-2 problem. I am conectring from Italy...
Spinning up one right now just to test: qwhsqs5v5whcda
And according to the logs its just hanging
We've identified the problem, it's not something I would've been able to discover on my own so thank you all for reporting issues :fbslightsmile:
I'll be back when I have details from our Site Reliability Team
Thanks...I'm going to shut down qwhsqs5v5whcda then
Same here, but also on EU-SE-1 this morning. Locks up solid
US-KS-2 should be good to go. @redparis, @Tenofas, @Hleb J
The datacenter made a change to their network configuration which caused the problems you saw, we've rolled it back.
I am testing it right now... does not look fixed. It is still slow, very slow.... do we have to do something our side? Cleaning cache or other stuff?
I don't have the exact timeline, but I am aware that we are currently working with the datacenter on a repair.
Same here, not US-KS-2 has frequent spikes (every 10-15 minutes) where network stalls and is unusable.
Tested on A100s and H100s
Going to have to delete my network storage there...this is just kind of ridiculous that it can go on for this long without a fix. Wasted money
Same
This is actually the second time for me, I moved once from another region because the network was so shit
At the end I moved to a different datacenter a few days ago... it was fine on the new datacenter till yesterday. today even EUR-IS-1 is slowing down... 😕
dont move datacenters.. its sometimes happening but will recover