Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

RunPod Pod Configuration: Domain Mapping, Persistence, and Docker

Hey RunPod team 👋 We’re currently running a FastAPI-based service on RunPod and wanted to clarify a few things: 1. Custom domain: Is there a supported way to map our own domain name to a running pod instead of using the default URL? ...

🔴 Urgent: Deleted Pods & Data Loss – Immediate Assistance Needed!

Hi team, We at Foyer (getmerlin.in) use RunPod extensively for critical research workloads. After a payment issue on our card (card declined), we updated the card successfully. However, today we woke up to find all our pods deleted, and noticed that autopay was turned off without any manual action on our side! This was likely the main culprit. As our card is fine. ...

Is there anybody that can help me to find out how long I used my pods and how much it costed me

Is there anybody that can help me to findout how long I used my pods and how much it costed me. Console money changes drestically and the dashboard is not very detailed how can I see like more detailed usage information ?

Lost GPUs mid-run

I was running on a 5090 pod with 3 GPUs (that's what was available to it). Mid-run my software complained that there are no CUDA GPUs. After stopping the app I tried nvidia-smi and got "Failed to initialize NVML: Unknown Error". That's never happened before.
Solution:
Restarting helped. So just FYI. I also opened a ticked about this with the pod ID.

Lack of GPUs (A100, H100) on all locations with network volume

Hi guys, seems like there is a lack of A100 and H100 GPUs on all network volume location options. Is there an ongoing shortage? Is there a better way to do this given that my dataset takes a long time to be downloaded so I prefer to have it seperated from compute, but most network volume location options have A100 and H100 in low availability....

Blank COmfyui

[ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/extension-node-map.json [ComfyUI-Manager] default cache updated: https://raw.githubusercontent.com/ltdrdata/ComfyUI-Manager/main/custom-node-list.json Context impl SQLiteImpl. Will assume non-transactional DDL. No target revision found....
No description

Shell Initialization Harms VSCode Remote SSH Integration

Hi, I noticed many of the conveniences offered by VSCode Remote SSH (e.g. git credentials forwarding, the code command, etc.) isn't available when connected to a RunPod instance. After some investigation, it seems that the shell initialization scripts are the culprit. The provided ~/.bashrc runs source /etc/rp_environment at the very end. Inside this file, among other things, is a hard-coded instruction that sets the PATH:...

Enable access for NVIDIA GPU profiling

Hey! TL;DR: Can we enable access to GPU performance counters by default as described here? ...

Pods dissapeared

Hi RunPod team, We're aware of the ongoing AWS issues, and it seems to have impacted our pods. They were returning errors earlier and have now completely disappeared from our console. Is there a way to check they were deleted, or if they are just temporarily not visible due to the outage?...

Turn off Pod without using https://console.runpod.io (which is currently down)

I don't have an API. Is there any other way to turn off my pod? Or am I doomed to either run out of money and risk losing my pod or wait and also lose money? I already sent an email to support.
Solution:
Seems to be fixed, fwew

https://rest.runpod.io/v1/pods returns 500 INTERNAL ERROR

It started at 7:45 PDT (14:45 UTC). I don't have any other details.

ComfyUI

Have my two pods launched (3060 and 5090) ncant stopped it( may i have refund or help?...

I was training a LoRa and I can't close my pod.

I'm paying 10$ per hour for the training and I can't close my pod. What is going on with this application?? Issues left and right? Am I gonna get refunded for the time this keeps on wasting my money and has locked me out of the account so I can't even close it.

using Image AI models with fastapi

I’m planning to run an image generation app using Python and Hugging Face diffusers (like SDXL + ControlNet). I don’t want to use ComfyUI I just want to run my own Python scripts or FastAPI server. Which template or setup should I choose for that? ...

error creating container: Error response from daemon: Head "https://registry-1.docker.io/v2/nextdiff

error creating container: Error response from daemon: Head "https://registry-1.docker.io/v2/nextdiffusionai/comfyui-sageattention/manifests/cuda12.8": received unexpected HTTP status: 503 Service Unavailable can you help me...

pods are not connecting terminals are not working

Pods terminals are not connecting i tried everything restart as well fix it asap

How to automate redeployment of pods with network storage?

Hi, I'm using a network volume for my pod to serve an LLM. Since I only need the model to run for a few hours each day, I want to be able to stop and start it easily. However, whenever I stop the pod, all the configuration seems to disappear from the pod page, as if it was never created. The ai-chatbot in discord told me to use automation(scripts etc.). Is there any documentation about how to achieve that?

pod shut down on its own

pod ilmcztyo6x9wuk i didn't want it to be shut down, it wasn't interruptable and had no maintenance alert.....

CUDA error

I keep getting this message: torch.AcceleratorError: CUDA error: uncorrectable ECC error encountered Search for `cudaErrorECCUncorrectable' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect....

jupiter

🐛 Bug Report: JupyterLab port 8888 disappears after pod restart Template: Official ComfyUI (latest version) Pod ID: wgenifr3985iv5 Region: EU-RO-1...
No description
Next