R
RunPod4mo ago
annah_do

Pod is stuck in a loop and does not finish creating

Hi, I'm trying to start a 1 x V100 SXM2 32GB with additional disk space (40 GB). It worked fine until yesterday. now when I'm trying to create it gets stuck in this loop:
2024-02-23T11:34:34Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:47Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp 54.227.20.253:443: i/o timeout
2024-02-23T11:34:47Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:55Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:35:02Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2024-02-23T11:34:34Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:47Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp 54.227.20.253:443: i/o timeout
2024-02-23T11:34:47Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:55Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:35:02Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
It did work with a larget GPU yesterday... Can anyone help me? thx
9 Replies
ashleyk
ashleyk4mo ago
I assume this is BG region in Community Cloud? I am having the same issues with A5000's in BG. @JM can someone contact/unlist this host please? Its wasting our money when the internet is broken and can't even pull the Docker image. Worst of all is I leave it to pull the docker image and then it goes into an infinite loop and wastes my credits I think we shouldn't be charged for docker image pulls for pods and only for the time the container is actually running like with serverless.
annah_do
annah_do4mo ago
it's in the previous generation of community cloud
ashleyk
ashleyk4mo ago
Whats previous generation of community cloud? There isn't any such thing as previous generation of community cloud.
annah_do
annah_do4mo ago
I'm not sure how to express it.. i select community cloud and then I see this. in the lower part it says previous generation.
No description
ashleyk
ashleyk4mo ago
Oh thats a heading for GPU type, which specific GPU type are you using?
annah_do
annah_do4mo ago
I was using 1 x V100 SXM2 32GB. if that's not the GPU type, then how would i find it?
ashleyk
ashleyk4mo ago
Yes thats it, but in which region? I think its BG region, they offer that GPU type.
annah_do
annah_do4mo ago
what do you mean by region? and what does BG stand for? sry, im new to this...
JM
JM4mo ago
Good intel both. Thanks! We are actually working on a big quality control initiative. It includes: - Putting a very hard enforcement on minimum specs, while decommissioning machines that do not meet those. Even if those have been onboarding a long time ago. - Much more strict and automated verification. - Automatic benchmarking of all servers, and multi-gpu usage testing. All servers, no exception. Any low performing ones that are out of the ordinary are going to get removed or upgraded.
Want results from more Discord servers?
Add your server
More Posts
optimize ComfyUI on serverlessI have ComfyUI deployed on runpod serverless, so I send the json workflows to runpod and receive theProbleme when writing a multi processing handlerHi there ! I got an issue when I try to write a handler that processes 2 tasks in parallel (I use ThIdle time: High Idle time on server but not getting tasks from queueI'm testing servers with high Idle time to keep alive and get new tasks, but the worker is showing iIs there a programatic way to activate servers on high demand / peak hours load?We are testing the serverless for production deployment for next month. I want to assure we will havRunpodctl in container receiving 401Over the past few days, I have sometimes been getting a 401 response when attempting to stop pods wiIncreasing costs?guys last few days seems an increase in cost without a spike in active usage. do you have any idea wCannot establish connection for web terminal using Standard Diffusion podI'm able to connect to the Webui HTTP client. And I can connect via SSH from my local machine AND I [URGENT] EU-RO region endpoint currently only processing one request at a timeWe have a production endpoint running in the EU-RO region but despite us having 21 workers 'running'Runpod errors, all pods having same issue this morning. Important operationI got this error on all my pods today We have detected a critical error on this machine which may aHi, I have a problem with two of my very important services, and I received the following messageHi, I have a problem with two of my very important services, and I received the following message: