Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

🔧｜api-opensource

📡｜instant-clusters

🗂｜hub

tanawatl

5/20/2025

About building container with Git repo

I'm not sure, can I use buildx command in 'Container start commnd'. And from chatGPT, it said it need to push image to Docker hub before using. This is my command , is it valid ? docker buildx create --name mybuilder --use...

Evgeniy_Wis

5/19/2025

Generation with increasing worker`s amount from 5 to 10

Hello everyone, there is such a question: with an increase in the number of workers at the endpoint, will generation become more expensive or only faster?

xixi

5/19/2025

Serverless Endpoint using official github repo Stuck at "Waiting for building"

Hi, I'm deploying a serverless endpoint using the official github repo runpod-workers/worker-template. I just fork this repo and add one line in dockerfile "RUN pip install --upgrade pip && pip install uv" WITHOUT any other changes. Build completes successfully with no errors in the logs. However, during the testing phase, the status remains at "Waiting for building" indefinitely. No test logs are generated. After about an hour, the process cancels automatically. I've tried increasing the max workers to 2 and allowing multiple GPU types, but the issue persists. Could someone assist me in identifying what's causing the worker to hang during initialization?...

twobit

5/19/2025

All workers idle despite many jobs in queue

I have 5 workers sitting idle and 100s of jobs stuck "in queue" without any processing

Solution:

OK, looks like Microsoft pulled their model off Hugging Face very unexpectedly: https://github.com/microsoft/TRELLIS/issues/264

Ulibach

5/17/2025

slow model loading times with vllm

deployed vllm worker from webui with 0.8.5 version and attached a network storage. it is a finetuned gemma3 model. INFO 05-17 20:09:56 [loader.py:458] Loading weights took 113.32 seconds INFO 05-17 20:09:56 [model_runner.py:1140] Model loading took 23.3141 GiB and 160.792180 seconds...

5/17/2025

Stop storing pull image process

For some reason, I suddenly got all of the pull image logs into the logs section in serverless, and it's now really cumbersome to find actual runtime logs.

DeathStreak

5/17/2025

nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.8, please update your drive

error starting container: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.8, please update your driver to a newer version, or use an earlier cuda container: unknown...

tanawatl

5/17/2025

Some query take a long time than usual

I notice that some query take very long time (stucking in delay), why ? Ps. I notice thar problem occur when I leave server idle for a while...

tanawatl

5/16/2025

The total token limit at 131

I use vLLM and set max model length to 8000 a2048 but out is just 131 (total out + in ), although i have set max tokens to 2048. I try with 2 models and result is the same.

kudesnik

5/16/2025

failing to start job

One of 10 times we are getting error message when trying to pass message. The error is no inside serverless container. Job is not getting processed by runpod itself. running from fastapi background_task. Sot the trace is not full ```...

Yobin

5/16/2025

5090 error serverless

Does the vllm image have the old pytorch? im getting this NVIDIA GeForce RTX 5090 with CUDA capability sm_120 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90. If you want to use the NVIDIA GeForce RTX 5090 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/...

dpk

5/16/2025

How to edit the vLLM settings on a serverless instance originally created with "quick deploy"?

I'm trying to figure out how to change vLLM settings on a serverless instance that isn't working quite right. There's a ton of tunables on the quick deploy dialog but I can't figure out where to change them on an existing endpoint.

Solution:

Hmm, use the quick deploy again and see the env, or check vllm worker github repository for the env variables

Message Not Public

Burki

5/15/2025

Serverless TimeoutError: "Failed to get job"

Issue:
Getting repeated TimeoutError in RunPod Serverless with no clear cause (no GPU OOM or other errors).
- Error: Failed to get job. | Error Type: TimeoutError | Error Message: Runpod serverless
- Happens even with 120s timeout, single request takes at max 20 sec. Configuration: ...

Himanshu Kotkar

5/14/2025

🚨 Inconsistent Execution Time Across Workers for Same Input on L40s (48GB Pro) – Need Help

Hi everyone, I'm facing a strange issue with my RunPod endpoint set up using latentsync on L40s 48GB Pro with 10 workers. The problem is that the same input request is taking vastly different execution times across different workers. - Some workers complete the task in 10–15 minutes...

shensmobile

5/13/2025

vLLM Dynamic Batching

Hi, I currently use a locally hosted exl2 setup but want to migrate my inference to RunPod serverless. My use case requires processing hundreds, sometimes thousands of prompts at the same time. I'm currently taking advantage of exl2's dynamic batching to figure out the optimal collating for batch processing. Does vLLM backend support taking in thousands of prompts (some of which could be close to 4096 tokens long) through the openAI API and process them as a job and return the results as a ba...

morrow

5/13/2025

How Low-Latency Is the VLLM Worker (OpenAI-Compatible API)?

Hey team! I'm looking into using Runpod's VLLM worker via the serverless endpoint for real-time voice interactions. For this use case, minimizing time-to-first-token during streaming is critical. Does the OpenAI-compatible API layer introduce any noticeable latency, or is it optimized for low-latency responses? Using llama3, I've seen ~70ms latencies when running a VLLM server on a dedicated pod. Is similar performance achievable with the serverless setup or is there any infrastructure induced latency? If there is, could you point me toward a way to achieve my goal ? Runpod auto scaling would be amazing for this project as it will handle large volumes of inferences....

Rahul Bhatewara

5/13/2025

Serverless Text Embedding - 400

I'm using a text embedding serverless endpoint to run an instance of "sentence-transformers/all-MiniLM-L6-v2". I keep getting a bad request 400 error. The old code I had (using openAI SDK) stopped working and I've tried to configure based on new documentation without any luck. Would greatly appreciate any help! New --------- runpod.api_key = os.getenv("RUNPOD_API_KEY")...

RukshanJS

5/12/2025

Why aren’t job ID s standard UUID?

When we create a job using runpod the returned job ID is not a standard UUID. Instead, it’s some UUID with some suffix. I would like to know the reason for this, and also how to standardize the job IDs. The reason for me to want this is because in our database we store the job ID but it violates UUID constraint...

TOMMY PIXELS

5/11/2025

Error Pulling Image (unauthorized). Work for US- GA servers, but not anywhere else?

I'm getting this error on most of my instances and they are stuck on intializing. (error pulling image: Error response from daemon: Head "<docker-registry>": unauthorized: incorrect username or password) . US- GA RTX a4000 servers ARE running, but can't get it to work on anything else. Any thing I can do? I've tried generating new credentials.

jazys

5/10/2025

No space left on device on a serverless work

I have train a model with replicate. I have use this article to create a docker image from this new train model https://blog.runpod.io/replicate-cog-migration/. Docker image is ready on my servless endpoint. I have attached different storage from 100 GB to 1 TB, but I have still this message on worker (see log attached file). What I'm doing wrong ? Thanks for your help

message.txt

Previous Next

Gaming

Programming

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!