Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Custom domains for the serverless endpoints?

Is it possible to use a custom domain for the serverless endpoints? If not, are there plans to implement this as a feature?

vLLM Endpoint - Gemma3 27b quantized

Hello, I’ve been trying to run the quantized Gemma3 model from Hugging Face via a vLLM endpoint, but the repository only provides a GGUF model file without the configuration files required by vLLM. I’m aware that vllm serve has an option to pass a custom configuration, but that doesn’t seem to be available when using the vLLM endpoint....

disable testing during github deploy

Hello. Is there a way to disable the testing during GitHub docker image deployment? Your Docker image building machine does not support CUDA 12.8 for testing yet. I want to use CUDA 12.8 in 5090. And the testing is kind of buggy and redundant. The testing logs only flash in the log field for a few seconds, and then they disappear. No other ways to find it anymore. This makes it very hard to debug what is going on in the testing process.

Serverless Pod Disk Space Issue with Large Model (FLUX.1-schnell)

I'm trying to deploy a Hugging Face diffusion model (black-forest-labs/FLUX.1-schnell) on a serverless GPU endpoint, but I'm running into a "No space left on device" error during model download. Even though I selected 100GB storage volume, logs show that only ~5GB disk space is being used, and downloads fail due to lack of space. 1. Why isn't the full 100GB volume being used?...

Queue Delay Time

What is currently normal delay time? I remember that previously, it was normal to have delay times in milliseconds and container startups were near instant even on the coldest of cold starts. But lately, I have been observing queue delays I don't recognize. Up to the point where even my vLLM image can fully initialize the engine on cold start and compute a full response almost in the same time RunPod takes just to start the container. Similar goes for SDXL, although it is a bit better there. Does container image size affect it? But then why would it happen also on warm requests? This makes it unthinkable to use. Creating a new endpoint or downgrading the RunPod SDK version didn't help. And overall there's nothing I could find that would allow me to influence this further. Plus as I said, it's not happening only on cold starts. The delay on a warm worker is even more extreme as it's sometimes longer than the execution time itself. (See screenshots) *Please note that delay time in this case is truly only the queue delay, as I had to move the initialization (loading models etc.) into the handler where it's counted as execution because I wanted to allow users to change vLLM configuration per-cold start via request payload....

Credit is deducted while worker is still starting

I was under impression I would be charged only when worker is actually running. See the below screenshot, it has started charging even if worker has not yet finished downloading the image
No description

Request count with idle timeout?

I swear I used to be able to set idle timeout with the request count option, but I noticed somewhat recently (within past couple months) this option is disabled unless I use queue delay. I would like to be able to send a request, and if another request is sent within 300 seconds, keep serverless running to await new request, otherwise startup another worker (request count set to 1). The idea with request count vs queue delay is simply due to function. With request count everything works as expected. When using queue delay, I have all sorts of problems across all my images, where things stay queued for hours and never actually start, so I cannot use queue delay....

Which file of git worker-template handle vllm ?

I need to add langchain to vllm , which file in git that is core llm initialize and able to modify ?

[SOLVED] [Errno 28] No space left on device

Hi there! Recently Runpod Serverless started to fail with this error message. How can I increase storage in serverless endpoints?...

Missing workerId in webhook

Is there any reason why my webhooks from serverless do not contain the "workerId"?

Serverless logs are littered with useless log messages

This seems to be happening with the newer python library, it makes the logs almost unusable
No description

worker is running but requests are all in queue and no running logs

The last running log is on yesterday (MAY 22), but the worker is running and costing $0.00034/s, all requests are in queue. Is there anyone can support this issue?...
No description

Requests Stuck in Queue After Docker Image Update

When I updated my GitHub repository (which is used in my serverless setup to build/deploy my Docker image), users should've still been able to generate images using the existing Docker image during the update process. However, while the Docker image was being updated, image generation through my RunPod serverless setup became completely unavailable. Requests were sent to RunPod but remained in the queue for an extended period, and I eventually had to cancel them to avoid incurring unnecessary costs. To resolve this, I terminated the current workers so that new ones would launch with the updated Docker image. This part worked as expected, and new workers were created using the latest Docker image. However, the issue persisted unfortunately. Image generation requests continued to get stuck in the queue, and I was still unable to generate any images. What made this especially frustrating was that I could not find any single log about it, so I had no way to troubleshoot the issue myself. I would love to see more logs on serverless to troubleshoot such issues if they happen again....
No description

Image Generation Stuck Until New Requests Are Sent

It typically takes 5–10 seconds to generate an image. However, sometimes a request doesn’t enter the processing queue until another request is sent. This issue occurred multiple times today, and I recorded it. In the example I captured, my friend's first request stayed in the queue for an unusually long time. I asked him to send a second request to try and trigger the first one to start processing, but that didn’t work. When he submitted a third request, it finally caused the first request to begin processing, followed by the second, and then the third. This issue occurs frequently and significantly impacts the user experience, as requests that should complete in 5–10 seconds end up taking several minutes....

Download Hugging face Model failed

OSError: Can't load the model for 'facebook/wav2vec2-base-960h'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'facebook/wav2vec2-base-960h' is the correct path to a directory containing a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.\n The above exception was the direct cause of the following exception:\n RuntimeError: Data processing error: I/O error: Permission denied (os error 13)\n ...

Selecting a hf quant

Hi, using vllm serveless. Is there a way to specify a specific quant to use for a hf gguf model directory url?

serverless today is just not working at all even after the incident announcement still slow

serverless today is just not working at all even after the incident announcement still slow and not working at all our production servers is just failing

Cold start issue

I stuck with cold start issue,that make the response very slow when make a new request after a longtime. Are there anyways to solve this issues ?

Strange results in Serverless mode

What Am I doing wrong? Why the response is that strange? I've attached the input params and one of the result....
No description

About building container with Git repo

I'm not sure, can I use buildx command in 'Container start commnd'. And from chatGPT, it said it need to push image to Docker hub before using. This is my command , is it valid ? docker buildx create --name mybuilder --use...