Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚑|serverless

β›…ο½œpods

πŸ”§ο½œapi-opensource

πŸ“‘ο½œinstant-clusters

πŸ—‚ο½œhub

Pod data loss after disk resize

I deployed an RTX 3090 pod on RunPod and was working on a project. I then edited the pod to increase the disk size. Afterward, I could no longer connect to the server, and all my files and folders seem to have disappeared. What can I do?

Can't connect to terminal or jupyterlab on runpod pytorch 2.1 or 2.4 template

This is for EU-RO region, I tried a different region it worked fine, but I need to use EU-RO because of network volume

NVIDIA Driver Selection

Is there a way to select which NVIDIA GPU driver my pod is using?

How to extend pod with saving plan

I purchased a pod for 1 week saving plan. How can I extend the contract with 1 month saving plan before the term ends? I want to keep the files and scripts in the current pod's disk volume

securing channel...room not ready

in the web terminal im trying to import loras with ComfyUI with Flux.1 dev one-click, when putting in the code from my PCs terminal i get the the error securing channel room not ready, the pod says everything is running and cpu is on 0, i have no tech skills so i need help

Pod unusable, extremely slow

β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ [0/7] Installing wheels... warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance. If the cache and target directories are on different filesystems, hardlinking may not be supported. If this is intentional, set export UV_LINK_MODE=copy or use --link-mode=copy to suppress this warning. β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘ [5/7] torch==2.6.0
...

jupyter not opens when I activate the pod

I tried to open pods with runpod/pytorch:2.0.1-py3.10-cuda11.8.0-devel-ubuntu22.04 image, but the jupyter lab server doesn't connect to pods and show 502 bad gateway even I wait more than 5minutes However Web Terminal works fine and it makes me more confused as the server is still work fine, just jupyter doesn't works My pod id is 0jct97bqf5ijws...
Solution:
However, even the container and system log from the pod with problem is completely looks normal from outside (which cross-checked by tickets I send through webpage) it still doesn't work, so I think the problem is related to some version mismatch which necessary to run jupyter lab.

Pytorch 2.4.0 ROCm 6.1 pod broken

When I create a pod with a pytorch image, the following error is displayed in the log. Also for other pytorch images. I have selected the MI 300X GPU create container runpod/pytorch:2.4.0-py3.10-rocm6.1.0-ubuntu22.04...

Web terminal not working

when I create a pod with two A5000 GPUs and then open the web console, the window is only white, even after several minutes I use the ollama template...

Add GPUs to an existing pod?

Hello, Is it possible to add more GPUs to an existing pod without having to create a whole new pod? I don't want to lose my stored data on there....

Looking for a GPU Cloud Service Platform that support bare metal RTXA6000.

Hi everyone! Who can recommend the GPU clould platform that can provide the RTXa6000? RTXA6000 should support Docker and be bare metal. I know runpod provide RTXA6000 but it doesn't support Docker and bare metal. So I ask another cloud suervice. ...

Huggingface changed to unauthorized?

I've been running a hugging face space on runpod following these instructions: https://blog.runpod.io/run-huggingface-spaces-on-runpod/ No changes to my hugging face space or the runpod template, but all of a sudden I'm getting the following error in runpod logs: error pulling image: Error response from daemon: unauthorized: As I said, I'm using the same huggingface token and nothing else has changed. Any suggestions to get my runpod container working?...

Cannot Stop pod from RunPod Web UI

I created a new pod using a Pytorch 2.4 template with network storage access. The UI did not give the option to stop the pod. It only showed the following Actions: Lock Pod, Edit pod, restart pod, reset pod, terminate pod. The only way i was able to stop the pod was by connecting to the pod via SSH terminal and running: runpodctl stop pod {RunPodID} ...

Provisioning script for Runpod ComfyUI template issue

I have a provisioning script for a Runpod template with the goal to give my clients a complete ComfyUI setup with all my workflows in one click deploy, but unfortunately it's not correctly downloading all the image model checkpoints nor workflows. Here is the provisioning script for AI-dock: https://raw.githubusercontent.com/kingaigfcash/aigfcash-runpod-template/refs/heads/main/default.sh And here is a screenshot of my template settings & link:...
No description

Cannot find any model weights with `/models/huggingface-cache/hub/models...`

Hi, I made a docker image using the "STEP-2" mentioned in Readme file. I created an template with docker image with below environment variables: MODEL_NAME="migtissera/Tess-3-Mistral-Large-2-123B" MAX_MODEL_LEN=65536...

nvidia docker

Hi, could you help me with launching nvidia docker in the pod? It does not work either via docker pull inside template pods dockers, nor via running nvidia docker itself (it does not have ssh ). Could you give any link i can use to run it somehow?...

No device found for buffer type CPU for async uploads

Trying to deploy a pod using KoboldCpp template. The model downloads and all the layers go onto the GPU (A6000, 70B q4ks, 12288 context), but then it just sits there with this and I can't connect to it
No description

URGENT: Multiple H100 instances critical error - ICML deadline tomorrow

🚨 Critical Issue: - 3x H100 SXM pods simultaneously received critical error messages and experiments terminated - IDs: Iv8utoj2mozzp6 (1x H100), afinjwp2ryg3ub (2x H100) - Time: ~02:27 KST, Jan 30 - Image: runpod/pytorch:2.2.0-py3.10-cuda12.1-devel-ubuntu22.04...
No description

Custom external port number

Does it possible to set custom external port number when create new instance in Runpod?