Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

🔧｜api-opensource

📡｜instant-clusters

🗂｜hub

TZUZ

8/20/2025

New build endless "Pending" state

Usual the worker updates after some minutes the new build. This time it keeps for hours and counting … "pending"... Also earlier working builds dont start to build. Has anybody else this issue?...

Annjawn

8/20/2025

Skip build on Github Commit

Hi there, is there a way to skip kicking off build on Commit to Github? I have Serverless endpoint setup with direct Github integration with Runpod (not Github actions) and I am wondering if there is a way to skip kicking off the build using the commit message like "[skip ci]" or something similar.

Solidsoldier

8/20/2025

Model initialization failed: CUDA driver initialization failed, you might not have a CUDA gpu.

Getting this error few times while loading a model that runs on gpu/torch, then the model proceeds to get loaded on CPU. Even tho most of the times the model loads and runs fine on GPU....

한유림

8/20/2025

getting occasional OOM errors in serverless

I'm running a small service using runpod serverless + comfyUI, and once in a while I get this error. `"error": "Traceback (most recent call last):\n File "/handler.py", line 708, 'in handler\n raise RuntimeError(f'{node_type}: {exception_message}')\ nRuntimeError: WanVideoSampler: Allocation on device \nThis error means you ran ...

Snow ❄

8/19/2025

ComfyUI + custom models & nodes

I've read this here, and tried it: https://github.com/runpod-workers/worker-comfyui But im still not sure if I did it correctly. So I made a docker file based on one of the versions and add the things I need: ```Dockerfile...

Solution:

You'll stop seeing the error you had, where a worker was spawned to try to handle that job but it was throwing: requirement error: unsatisfied condition: cuda>=12.6, please update your driver to a newer version, or use an earlier cuda container: unknown...

moe

8/19/2025

bug in creating endpoints

im trying to create endpoint comfyui 5.4.0 from new gmail acc. from serverless page, i goes through new endpoint under serverless, when i press deploy pod , a pod is created instead of serverless...

Solidsoldier

8/19/2025

16 GB GPU availability almost always low

Hence very frequent throttling workers and pulling docker image again and again

Annjawn

8/18/2025

Endpoint specific API Key for Runpod serverless endpoints

I am looking for a way to create a Runpod API Key that is specific to a Serverless endpoint. Is this possible?

slxnxl

8/18/2025

generation-config vllm

Hey! Need help with vLLM Quick Deploy setup. I'm getting this warning and can't override sampling parameters in API requests: WARNING 08-18 15:40:11 [config.py:1528] Default sampling parameters have been overridden by the model's Hugging Face generation config recommended from the model creator. If this is not intended, please relaunch vLLM instance with --generation-config vllm. How do I add --generation-config vllm parameter when using Quick Deploy? Want to be able to set custom top_k, top_p, temperature in my requests instead of being stuck with model defaults. Thanks!...

Yobin

8/17/2025

New UI New Issue again lol

I'm the admin + owner of the github but I get this in the new version of the UI.... a bit frustrating

Myst

8/16/2025

ComfyUI looks for checkpoint files in /workspace instead of /runpod-volume

I had a comfyUI on-demande GPU pod, and now need to switch a serverless pod. After setting up the endpoint, I can run some requests, but I see my comfyUI workflow says there are missing checkpoints and LoRa. My serverless workers are correctly connected to my 100Go volume. So it seems the path is actually different in both instances. How can I either: - move the files from /workspace/comfyUi/checkpoints to /runpod-volume/comfyUI/checkpoints ? or...

Tom Huibers

8/16/2025

Unhealthy worker state in serverless endpoint: remote error: tls: bad record MAC

I'm using a runpod serverless endpoint with worker limit 6. The endpoint performs well, except for one error: sometimes a worker gets "unhealthy" and HTTP requests fail with: request failed: Post "https://api.runpod.ai/v2/s3bxj20mra4dvp/runsync": remote error: tls: bad record MAC OR "request failed: Post "https://api.runpod.ai/v2/s3bxj20mra4dvp/runsync\": write tcp [2001:1c02:2c09:9100:7bab:2fba:21cc:6df1]:53732->[2606:4700::6812:9dd]:443: use of closed network connection" ...

bohdanbehmat

8/15/2025

Job Dispatching Issue - Jobs Not Sent to Running Workers

CPU HeadPod running Gradio frontend for comfyui serverless backend. Serverless nodes starts with custom image, they run comfyui directly from NetworkDrive with NetworkDrive venv3.12.3 My settings are configured to have one worker per job. When I send two jobs in parallel from differnt PCs, the platform correctly scales to two running workers, but the job queue assigns both jobs to the same worker sequentially. The second worker remains running without receiving a job. Log on a second worker says that comfy and handler is ready. Configuration:...

Yobin

8/15/2025

Stuck at initializing

I changed into a bigger one becuase of the usual vram error but then this happened (i used l40s before it wasnt like this)

Morganja

8/14/2025

So serverless death

Not sure what you guys did tonight. But the endpoint stopped passing jobs to my vLLM workers at about 3pm my time. The backup was fine. I trashed all the workers and still they would sit there ready, jobs in queue and they would just run until timeout. I had to trash the endpoint, redeploy and add the new endpoint into rotation. So I figure you owe my at least $30 in credit ... not to mention my time ... (2hrs to deploy and qual check)...

한유림

8/14/2025

How to bake models checkpoints in docker images in ComfyUI

I've seen in the earlier discussion that it is faster to bake models in the images thanks to runpod flashboot, in https://discordapp.com/channels/912829806415085598/1364592893867724910. Thanks to @gokuvonlange for explaining it! Does that mean I have to bake comfyUI git repo and all the other custom nodes and requirements too? And how does it make it faster then just using network volume?...

Solution:

yes either those 2 in image or copy it to /workspace then run from there

Message Not Public

Elif Sezgin

8/13/2025

Is it possible to set serverless endpoints to run for more than 24 hours?

I’m trying to configure my serverless endpoints so they can run for more than 24 hours. I set a policy for the job as described here: https://docs.runpod.io/serverless/endpoints/send-requests#execution-policies and also set the executionTimeout to a value higher than 24 hours when creating the endpoints. However, the jobs still exit exactly at the 24-hour mark. Is it possible to increase this limit, and if so, how?

morrow

8/13/2025

New load balancer serverless endpoint type questions

Hey team ! In the past, i've tried to use runpod's queue based serverless for my voice AI project but the added job queue latency was just making this impossible. Voice AI required sub 200ms inference latency and the overhead made it huge and unpredictable. This is ok for long running jobs but not for high frequency / low latency. This new load balancer serverless endpoint type looks amazing and seem to be solving a real feature gap in the GPU provider game. ...

SourabhSingh

8/13/2025

Mounting a network storage on comfyui serverless endpoint

I have a network storage where i have downloaded all the models that i will need to generate the image using comfyui interface. All the models and custom model have been verified by running some workflow in a POD instance and images are generated as i had intended. Just to avoid manual setup, i have used the comfyui image on serverless endpoint and i have used some default model to generate the image using flux1-dev-fp8 model. Images were generated perfectly and then i tried to generate the images using my own workflow and as expected i had got missing custom node issue. So i edited the endpoint and added the network storage from the advance setting but still getting the same error related to missing custom nodes. Can anyone guide me to solve this issue?...

Jay

8/12/2025

Testing default "hello world" post with no response after 10 minutes

Attached a few pics of what I tried to do, I eventually cancelled it after a little under 10 minutes and never got a reply, it just stayed in queue. I assume I'm doing something wrong. I left all endpoint settings as default and set the hugging face url to openai/gpt-oss-20b

Solution:

oh change your image tag to

runpod/worker-v1-vllm:v2.8.0gptoss-cuda12.8.1

runpod/worker-v1-vllm:v2.8.0gptoss-cuda12.8.1

...

Message Not Public

Previous Next

Gaming

Programming

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!