GraphQL API Issue: Launching Pod Fails (GRAPHQL_VALIDATION_FAILED)
I'm unable to launch a specific pod configuration via the GraphQL API (podFindAndDeployOnDemand at https://api.runpod.io/graphql), consistently getting an HTTP 400 GRAPHQL_VALIDATION_FAILED error.
My myself query works fine with the same API key.
Target Configuration:...
Post Startup Download Scripts
I have a docker image built on BASE runpod/base:0.6.3-cuda11.8.0. After all the startup etc is done, I want to download a few models etc as an optional background step. is there a hook to do this?
Creation of new pods in EU-CZ-1 results in "ssh: connection refused"
I have a previous pod running that works fine in that same region, but any new pods don't work with the same ssh key and everything. Using direct connect as the proxy is broken for ssh keys. Using for 3090 gpu
Pod networking issues?
I have 8x L40S and 8x RTX 6000 pods that seem to have no internet connectivity. I've been trying to install python packages hosted on github (via pip) and load models from torch hub but I get the following errors.

ComfyUI never opens on port 3000
In last 24 hours comfyui on port 3000 always fails to load, just constant 'transferring data' message in browser. All logs show everything running and ready as usual
Unexpected Pod Billing After Failed Deployment
Yesterday, I attempted to deploy a new pod, but after clicking "Deploy," I received an error message along the lines of:
"This GPU is no longer available, we couldn't deploy your pod."
This happened when the GPUs went down yesterday.
Once everything was back up, I checked my account. The pod had not appeared in my "My Pods" list, and I hadn’t been charged — so I assumed the deployment had failed....
Pods are terribly slow
Hey.
I usually deploy pods from US-TX3 and everything became very slow. I have to wait minutes to see jupyter and/or comfyui launch. I also have problems with images not loading. I tried changing machine then server. Still unusable. Any clue ?
Thanks...
Slow Image previews regardless of pod
Hey, I am experiencing issues with image previews showing up really slowly, no matter what community pod I use.
This kind of loading happens (I am using a1111). There is no problem with my internet, and I have not experienced this kind of slowdowns before.
Pod download speed via API to CivitAI for example seems completely fine also....

Issues with SSH in Axolotl Pod
I can't do an SSH connection with SCP / SFTP.
I tried generating new SSH key pairs, I specifically adjusted their permissions, connected with the recommended ssh command for the pod (I also tried starting a separate pod and manually building the command), and double checked that I copied the public ssh string.
I am running an arch-linux based host OS, and connected via a terminal SSH command....
Solution:
check your ssh server (and the configs) again with web terminal
Message Not Public
Sign In & Join Server To View
CUDA device uncorrectable ECC error
I'm using a 5xH100 pod and got uncorrectable ECC error for device 1,2,3. Device 0 and 4 can be used without a problem. It seems the device or the system needs a reboot. Any help on this? I've already submitted a ticket on the website with the pod id.
Python 3.12.5 | packaged by Anaconda, Inc. | (main, Sep 12 2024, 18:27:27) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
import torch...
Volume full and deleting files doesnt free up space
My storage volume is full and deleting big files doesnt free up space. What can i do?
Created a pod, and it is not appearing in my list of Pods so I can't view it to turn it off
I created a pod and it does not appear in my list of pods, but I'm being charged for it. Cannot view it to connect or turn it off.
Solution:
Pod console UI is back, seems like no downtime for a GPU Pod
Problem with hanging pod
Hi team, I've issue about the hanging pod, somehow the GPU is crashed and now all the process is hanging
Tried to restart the pod, it didn't work. Tried to stop and start again, and now it's can't get the pod up
Please help me with this. This is the pod ID:...
h100 servers having issues?
Hey RunPod folks, is something going on with the h100 secure cloud machines? I first got a number of weird issues on a 8xH100 (SXM) server (cross GPU links going down randomly? Hard to say what is exactly going on - I get random timeouts in multi GPU comms after days of work).
I tried spinning a new machine (ID: nyotnwudbsq0mu, ID: 23xahufe1yk33g) but they are stuck loading the docker images from our private Docker (that works great and I can access from other RunPod machines).
Can someone please have a look?...
vLLM Inconsistently Hangs at NCCL Initialization
Hi, I am trying to run vLLM on 2x A40s GPUs and it will sometimes hang at NCCL initialization. This inconsistently occurs and sometimes will work fine. But for a pod that it hangs on, repeated attempts will aways hang...
CUDA 12.4.1
python 3.10
vllm 0.7.3...

Issues when restarting stopped pod
For a few days, I've had multiple issues when restarting a stopped pod.
It will just hang saying "Container is not running" -- once I briefly caught an error in the system console about 'failed to start networking' and 'driver failed to program' -- is that an issue on the RunPod infra?
I should note that I'm running the exact same container image over and over again, and if I terminate the one the failed and re-create it from scratch it works every time, but I thought you were supposed to be able to restart a stopped pod? Oh, and I can confirm that the API says the pod was restarted with the GPU attached and is in 'RUNNIING' state, but it has the issue described above....
Need Pytorch 2.5 and 2.6 offical Docker Image
We need Pytorch 2.5 and 2.6 offical Docker Image that is safe and can be used for new features such as mit-han-lab/nunchaku project.
500 Response when creating pod using API
I always getting error when trying to create POD using API. Always the same response "create pod: There are no instances currently available"
