Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Recipe for Llama 4 Scout on vLLM

I am trying to follow this Llama 4 recipe from vLLM and deploy it on Runpod Serveless. Even using 2 x H100 or a B200, I could not deploy the LLM. Have someone able to deploy it?...

Multi-nodes Serverless Endpoint

Can I create a serverless deployment for two or more nodes? For example DeepSeek R1 671b needs at least two nodes. Thanks....

Serverless

Hi team 👋, I ran into an issue with unexpected billing (around $400) on my serverless vLLM endpoint while it was idle. Support explained it was caused by a CUDA 12.9 misconfiguration in my endpoint settings. They kindly applied a $100 credit 🙏, but I’d like to make sure I configure things correctly moving forward. ...
No description

Network connectivity issues on EUR–IS nodes (works fine on EUR–RO)

Hi Runpod team, I’m running the same workload across different regions, and I’ve observed consistent connectivity problems when using the EUR–IS nodes, while everything works perfectly fine on EUR–RO nodes. Symptoms: My service makes HTTPS calls to https://freesound.org/apiv2/.......

serverless request is killed, logs are gone

Hello, I am running a simple serverless, which I've runned manually in a pod without issues. I am not able to understand what is going wrong with my serverless deployment and the logs are not available after failing. Can I do something else to check such logs and understand better what is killing my requests via serverless but not via manual pod run?...

New build endless "Pending" state

Usual the worker updates after some minutes the new build. This time it keeps for hours and counting … "pending"... Also earlier working builds dont start to build. Has anybody else this issue?...

Skip build on Github Commit

Hi there, is there a way to skip kicking off build on Commit to Github? I have Serverless endpoint setup with direct Github integration with Runpod (not Github actions) and I am wondering if there is a way to skip kicking off the build using the commit message like "[skip ci]" or something similar.

Model initialization failed: CUDA driver initialization failed, you might not have a CUDA gpu.

Getting this error few times while loading a model that runs on gpu/torch, then the model proceeds to get loaded on CPU. Even tho most of the times the model loads and runs fine on GPU....

getting occasional OOM errors in serverless

I'm running a small service using runpod serverless + comfyUI, and once in a while I get this error. `"error": "Traceback (most recent call last):\n File "/handler.py", line 708, 'in handler\n raise RuntimeError(f'{node_type}: {exception_message}')\ nRuntimeError: WanVideoSampler: Allocation on device \nThis error means you ran ...

ComfyUI + custom models & nodes

I've read this here, and tried it: https://github.com/runpod-workers/worker-comfyui But im still not sure if I did it correctly. So I made a docker file based on one of the versions and add the things I need: ```Dockerfile...
Solution:
You'll stop seeing the error you had, where a worker was spawned to try to handle that job but it was throwing: requirement error: unsatisfied condition: cuda>=12.6, please update your driver to a newer version, or use an earlier cuda container: unknown...

bug in creating endpoints

im trying to create endpoint comfyui 5.4.0 from new gmail acc. from serverless page, i goes through new endpoint under serverless, when i press deploy pod , a pod is created instead of serverless...
No description

16 GB GPU availability almost always low

Hence very frequent throttling workers and pulling docker image again and again

Endpoint specific API Key for Runpod serverless endpoints

I am looking for a way to create a Runpod API Key that is specific to a Serverless endpoint. Is this possible?

generation-config vllm

Hey! Need help with vLLM Quick Deploy setup. I'm getting this warning and can't override sampling parameters in API requests: WARNING 08-18 15:40:11 [config.py:1528] Default sampling parameters have been overridden by the model's Hugging Face generation config recommended from the model creator. If this is not intended, please relaunch vLLM instance with --generation-config vllm. How do I add --generation-config vllm parameter when using Quick Deploy? Want to be able to set custom top_k, top_p, temperature in my requests instead of being stuck with model defaults. Thanks!...

New UI New Issue again lol

I'm the admin + owner of the github but I get this in the new version of the UI.... a bit frustrating
No description

ComfyUI looks for checkpoint files in /workspace instead of /runpod-volume

I had a comfyUI on-demande GPU pod, and now need to switch a serverless pod. After setting up the endpoint, I can run some requests, but I see my comfyUI workflow says there are missing checkpoints and LoRa. My serverless workers are correctly connected to my 100Go volume. So it seems the path is actually different in both instances. How can I either: - move the files from /workspace/comfyUi/checkpoints to /runpod-volume/comfyUI/checkpoints ? or...

Unhealthy worker state in serverless endpoint: remote error: tls: bad record MAC

I'm using a runpod serverless endpoint with worker limit 6. The endpoint performs well, except for one error: sometimes a worker gets "unhealthy" and HTTP requests fail with: request failed: Post "https://api.runpod.ai/v2/s3bxj20mra4dvp/runsync": remote error: tls: bad record MAC OR "request failed: Post "https://api.runpod.ai/v2/s3bxj20mra4dvp/runsync\": write tcp [2001:1c02:2c09:9100:7bab:2fba:21cc:6df1]:53732->[2606:4700::6812:9dd]:443: use of closed network connection" ...
No description

Job Dispatching Issue - Jobs Not Sent to Running Workers

CPU HeadPod running Gradio frontend for comfyui serverless backend. Serverless nodes starts with custom image, they run comfyui directly from NetworkDrive with NetworkDrive venv3.12.3 My settings are configured to have one worker per job. When I send two jobs in parallel from differnt PCs, the platform correctly scales to two running workers, but the job queue assigns both jobs to the same worker sequentially. The second worker remains running without receiving a job. Log on a second worker says that comfy and handler is ready. Configuration:...

Stuck at initializing

I changed into a bigger one becuase of the usual vram error but then this happened (i used l40s before it wasnt like this)
No description

So serverless death

Not sure what you guys did tonight. But the endpoint stopped passing jobs to my vLLM workers at about 3pm my time. The backup was fine. I trashed all the workers and still they would sit there ready, jobs in queue and they would just run until timeout. I had to trash the endpoint, redeploy and add the new endpoint into rotation. So I figure you owe my at least $30 in credit ... not to mention my time ... (2hrs to deploy and qual check)...
Next