Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

🔧｜api-opensource

📡｜instant-clusters

🗂｜hub

imam_janjua

4/30/2025

Error running ComfyUI workflow from pod on serverless

I’m encountering an error when running a ComfyUI workflow from a pod on RunPod serverless. Previously, I was running the ComfyUI workflow on a pod with network storage mounted, and it worked fine. Now, I want to run the workflow via API, so I deployed the endpoint using the image: timpietruskyblibla/runpod-worker-comfy:3.4.0-base with my network storage mounted as the endpoint’s storage....

DIRECTcut ▲

4/30/2025

Requests stuck in queue

Hi I am having issues with my serverless deployment - tasks are stuck in queue for 6-10 min, while there are idle workers (screenshot 1) I believe the issue to be with how the container is started, and not with the image itself....

neural-soupe

4/30/2025

Disk Volume Pricing for Serverless

I'm looking for clarification on disk pricing for serverless workers. On the pricing page a Container Disk price for running pods of $0.10/GB/month (and $0.20/GB/month for idle pods). How does this translate to the serverless workers? When I create a template for my endpoint I specify Volume Disk (e.g 20Gb); how am I being charged for this? 20 * $0.20 *number of workers per month (assuming the workers are idle)? ...

Talion

4/30/2025

Requests on Runpod Execute Twice, Doubling Billing Costs

I am experiencing unexpected behavior when running BerTopic clustering jobs. After the job finishes and posts the results, it unexpectedly triggers another execution of the same request. This results in the job running twice, effectively doubling my costs. This issue occurred previously, but it was resolved. However, it has now reappeared....

Yobin

4/30/2025

Failed to get job

im getting this ```python {"requestId": null, "message": "Failed to get job. | Error Type: ClientResponseError | Error Message: 502, message='Bad Gateway', url='https://api.runpod.ai/v2/[HIDDEN_ID]/job-take/3aa8fhul0on7fh?gpu=NVIDIA+A40&job_in_progress=0'", "level": "ERROR"} {"requestId": null, "message": "Failed to get job. | Error Type: ClientResponseError | Error Message: 502, message='Bad Gateway', url='https://api.runpod.ai/v2/[HIDDEN_ID]/job-take/3aa8fhul0on7fh?gpu=NVIDIA+A40&job_in_progress=0'", "level": "ERROR"}...

HearmemanAI

4/29/2025

When using Serverless, can I configure the CPU Core count/RAM?

Title sums up my questions pretty well 🙂

Sasan

4/28/2025

installing and using extensions on Automatic 1111

Hi everyone,
I'm trying to figure out how to install extensions on my panel via RunPod Serverless.
Since it looks like the panel needs a restart after adding an extension to apply the changes, I'm not sure how to handle that properly in a Serverless setup.
Also, for features like Refactor or using other specific options that don't seem to have direct API endpoints, is there any recommended way to interact with them?
It feels like there’s no clear method for these cases, and I'm a bit lost. ...

light

4/28/2025

Stuck when run is triggered via API call but not on dashboard?

I have a project that let's me upload videos on google cloud storage (it is very bare and that's the only thing that it does at the moment). If I trigger the request form serverless dashboard, the job gets completed, but if it is triggered via API it is stuck forever this is what the code looks like:...

TheRealPaulTurner

4/28/2025

No Space Left on Device /var/lib/docker/tmp reported during Worker Initialization

I am seeing "no space left on device" failure when initializing a serverless worker, RTX 4090 / 41 GB RAM class in US-IL. Does this mean that the worker does not even have enough disk space to deploy my Docker image?
-- snip -- 69168d8a856c Extracting [==============================================> ] 1.83GB/1.96GB 69168d8a856c Extracting [===============================================> ] 1.846GB/1.96GB...

Necromancer

4/28/2025

Job stucks at Queue for 10 mins

Barış

4/27/2025

Questions About Running ComfyUI Serverless on RunPod

I set up my ComfyUI project, ComfyUI Manager, custom nodes, and models on RunPod inside the /workspace directory of my network volume. When I temporarily deploy the volume and run python main.py --listen, I can access my ComfyUI workflow through the web on RunPod and generate images without any issues. However, after spending a few days trying to figure it out, I still can’t get it working with the serverless API. I've gone through a bunch of docs and videos, but to be honest, I'm just more confused now. The workflow runs perfectly through the web but I could never get it to run through serverless. Since everything is working fine on the web version, I feel like I'm really close to getting it working through the serverless API too. I'd really appreciate any help with this. I can also send over my files via DM if needed....

QuantumWizard

4/27/2025

The worker is always 0 even when there are requests in the queue.

worker can't build

Zsolt

4/26/2025

ComfyUI: "Failed to connect to server at http://127.0.0.1:8188 after 500 attempts" on serverless

Hi everyone, help would be greatly appreciated! 🙂 We're trying to move from permanent Pods to serverless and ran into this brickwall. We're having a sales call with Runpod on Monday so it's time sensitive. I followed the official instructions at https://github.com/runpod-workers/worker-comfyui . I've opened the port on the serverless endpoint but it does not solve the issue. We're using the Dockerfile from the official repo with slight modifications. Any ideas?...

Solution:

Thank you @Jason for the help! I had to tweak COMFY_API_AVAILABLE_MAX_RETRIES in rp_handler instead, but it did resolve the issue

sban5897

4/26/2025

US-NC-1 Failing to pull images

Just an FYI - Constantly having to kill these ones as they get stuck in Initializing error pulling image: Error response from daemon: Head "https://registry-1.docker.io/v2/runpod/worker-v1-vllm/manifests/v2.4.0stable-cuda12.1.0": Get "https://auth.docker.io/token?scope=repository%3Arunpod%2Fworker-v1-vllm%3Apull&service=registry.docker.io": read tcp 172.19.7.13:37010->98.85.153.80:443: read: connection reset by peer Worker ID - 45hzf7q7kf58sy...

vuuw

4/24/2025

Billing question

Hey, the math doesn't add up here... please check the images! How can I get the same amount through the API?

Yobin

4/24/2025

stuck at waiting for build

because clone endpoint is not working on my end i have to recreate the endpoint

Shubham Patel DJ

4/24/2025

Serverless instances are not assigned GPUs, resulting in job error in Production. Require Assist

Error Message 1 with Stack Trace: Task Failed [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:121 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudnnStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:114 std::conditional_t<THRW, void, onnxruntime::common::Status> onnxruntime::CudaCall(ERRTYPE, const char, const char, ERRTYPE, const char, const char, int) [with ERRTYPE = cudnnStatus_t; bool THRW = true; std::conditional_t<THRW, void, onnxruntime::common::Status> = void] CUDNN failure 4: CUDNN_STATUS_INTERNAL_ERROR ; GPU=0 ; hostname=0220236a79a1 ; file=/onnxruntime_src/onnxruntime/core/providers/cuda/cuda_execution_provider.cc ; line=177 ; expr=cudnnCreate(&cudnnhandle); \n Error Message 2: Failed to get job. | Error Type: ClientOSError | Error Message: [Errno 104] Connection reset by peer...

U forgot ur spanish lessons

4/23/2025

How to know which graphics card worker was ran on?

Hello! How can I tell which graphics card a job ran on? There's no info about video card neither in callback nor /status endpoint. this is all i've got ``` {...

Solution:

individual worker = id, so that means worker id = pod id

Message Not Public

atu

4/23/2025

Has anyone successfully deployed a serverless instance using wan2.1 to generate i2v?

I tried the most comfyui+wan templates but they are all for RunPod. Resources for creating serverless instance for this purpose seem quite scarce too. Halp pls?

Solution:

anyway, if anyone is looking to deploy serverless comfyui+wan take a look at my repo: https://github.com/atumn/runpod-wan

Yobin

4/23/2025

type must be one of the following values: QUEUE, LOAD_BALANCER when i clone an endpoint

Previous Next

Gaming

Programming

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!