RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods-clusters

Can we turn secure cloud instances on/off through some time of trigger function?

Hello everyone! Was wondering if I need to be paying for the pod 24/7 even if i will only be using the llm a couple of times per day, or if it can be turned on at certain times

How can I do scheduled backups with Azure using API?

I know about Cloud Sync, but how do I call it from my app?

Failed to Import Libraries on Runpod SD ComfyUI [RTX A 4000]

- hey guys every time I boot up my comfyUI runpod it always fails to load a few libraries and trying to update/fix them from the comfy manager doesn't seem to resolve the issues - I repeatedly install the individual dependencies but everytime I feel like the same modules come back as "module not found", I've looked at a few other solutions/threads but have been struggling to get this to work - anyone else face the same issue?...

How do I select a different template to the default in the new RunPod UI?

I might be missing something obvious, but: in the new RunPod Pods > Deploy UI, after selecting a GPU config, how do I pick a template other than the default? RunPod Pytorch 2.2.10 runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 ...

Can't open models/checkpoint folder in Jupyter for Comfy UI.

All the other folders open, but not the checkpoint folder. Want to install models from CivitAI. Is this normal on runpod or is it a template issue?

hello guys!I want to buy a RTX4090 pod,but the 46G Ram is not enoght.Is there anyway to upgrade ram?

i hope to buy a 64 g ram pod with rtx4090.need helps
Solution:
@rondos1701 wait if you are still on this try the filter thing

Am I able to host an app through reverse proxy with a custom domain name?

I have a domain name that I own and want to run my app with ssl through port 443. Is this possible to do on a pod? I am trying to run my gradio based app with Nginx and I cant seem to get it to work with a custom domain name.

Is it possible to change region of a network volume?

Would like to access high VRAMGPUs, which arenot available in EU-RO-1

How do i add cronjob in a pod?

I am using a pytorch image for my pod. I have cloned my repository and created an environment, as well as an app that is exposed on specific port. I only need to use this app for two hours per day, so when i want to use it, i manually start the pod, and after that, I manually stop it. I want to add something like cronjob, so that whenever i restart this pod, it will automatically run the specific commands and start my app

Can't connect to Civital lately when donig WGET commands, what am I doing wrong?

Username/Password Authentication Failed. root@8350c17f8def:/workspace/ComfyUI/models/checkpoints/sdxl#...

TensorRT-LLM setup

Has anyone been able to successfully install tensorrt_llm? I'm trying with pip, but I'm running into mpi related errors: Cannot open configuration file /build-result/hpcx-v2.16-gcc-inbox-ubuntu22.04-cuda12-gdrcopy2-nccl2.18-x86_64/ompi/share/openmpi/mpicc-wrapper-data.txt Error parsing data file mpicc: Not found...

Stable Diffusion Extension Installation Issues:

Hi! I'm new to this whole Discord and RunPod, so sorry if I've posted this in the wrong place or made any other mistakes. I've run into a problem when installing some extensions in RunPod. I've been trying to get [traintrain] (lets you create loras, not Kohya) and like [tagger] (which pulls tags from images) to work, but for some reason, RunPod just won't recognize them, no matter what I try. I even found a post on Reddit where someone was having a similar problem with the SD Dynamic Prompts extension not appearing on list or working at all. They said they tried turning off all the other extensions, but that didn't do the trick either....

Is it possible to make port 443 externally accessible?

Is it possible to make port 443 externally accessible? I want to remove the port number from the DNS name (https://example.com:34567). I have a solution in Cloudflare, but I need to access Cloudflare every time the pod is rebooted. Thank you
Solution:
nope it's not possible tcp ports are always random though you should be able to use cloudflare tunnels

Comfy launcher issue

Comfy launcher isn't downloading models or assets anymore. I wrote the dev on banodoco but isn't working for me anymore.
No description

Pods shutting down

Is it normal behaviour for a GPU cloud pod that is paid to be on 24/7 to require a cold boot every time it hasn't been used for a while? We have been paying for a GPU to be on all the time so it is quick to respond when we do demos of our software and it's always slow because the pod has to boot up

Connection unexpectedly abort

We are running an GRPC server inside runpods and 1~2% of request abort unexpectedly. Our API's log complain that downstream disconnect and I suspect RunPod NAT abort connection in certain situation. Is there any connection timeout or other policy for TCP connection, or is it just an unstability of runpod infrastructure?

Downloading file/directory from remote to local using SCP

Hi when trying to download from the remote I get a password request is there a workaround ?

POD's ERRORS :((((((

This server has recently suffered a network outage and may have spotty network connectivity. We aim to restore connectivity soon, but you may have connection issues until it is resolved. You will not be charged during any network downtime. MY IDs: g0htfaz7oe0lht brr2em0266otas...

Nvidia driver version

Where can I see what driver versions pods use? Is it the same for all GPU types? I get this error even when selecting cuda 12.3 ERROR: This container was built for NVIDIA Driver Release 545.23 or later, but version 535.154.05 was detected and compatibility mode is UNAVAILABLE....

Profiling CUDA kernels in runpod

Hi! I'm trying to profile my kernel with nsight-compute and I'm getting error : "==ERROR== ERR_NVGPUCTRPERM - The user does not have permission to access NVIDIA GPU Performance Counters on the target device 0." Which is explained on this page : https://developer.nvidia.com/nvidia-development-tools-solutions-err_nvgpuctrperm-permission-issue-performance-counters and has to be fixed on the host side. Anybody found a workaround for this issue or how to solve it? Thanks!...