Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Serverless or Regular Pod? (How good is Flashboot?)

Hello, I’m a new user of RunPod, and I’m using it for image generation AI. I’m planning to create an API based on the ComfyUI workflow I’ve developed, so people can enter prompts on my website and receive the generated images. However, I’m not sure whether I should use Serverless or just keep a regular Pod running 24/7 and manually create the API there. ...

Errors in container

I am receiving some weird container errors on some serverless endpoints I have. An example is below. They seem to have fixed themselves by me terminating the worker but didn't know if there were any issues you were aware of? Error response from daemon: Container XYZ124453 is not paused

🚨 HELP! Struggling with Super Slow Docker Pulls from Azure ACR on Runpod 🚨

Hey everyone! 😩 Is anyone else experiencing snail-paced download speeds when pulling Docker images from Azure ACR on Runpod? 🐢 It's taking ages, and it's seriously slowing down my workflow. Any tips, tricks, or solutions would be massively appreciated! 🙏 Details: Docker Repo: Hosted on Azure ACR...

Reporting/blacklisting poorly performing workers

I've noticed that every now and then a bad worker is spawned for my endpoint which takes forever to complete the job when compared to other workers running the same job. Typically my job would take ~40s but there are occassionally workers that have the same gpu but take 70s instead. I want to blacklist these pods from running my endpoint so performance isnt impacted

Flashboot not working

I have flashboot enabled on my worker, but it appears all of them are running off a cold boot, every time for some reason.
No description

How can I make a single worker handle multiple requests concurrently before starting the next worker

Hi everyone, I’ve deployed an image generation model using a 24GB GPU with 2 workers (1 active) on RunPod. Each image generation uses around 6-7GB of memory. My goal is to have a single worker handle multiple requests concurrently until it can’t handle the load anymore, and only then should the second worker start. Right now, when I send two requests, the second worker starts immediately to process the second request, even though my first worker should have enough resources left to handle both requests at once....

Flux.1 Schnell Serverless Speeds

What sort of speeds are people getting with their Flux.1 Schnell models using Serverless in RunPod? I'm currently hitting 30 seconds for 4 images with a significant amount of time moving the model to cuda (~15 seconds). Is there anyway to speed this up? (48GB GPU Pro)

Job timeout constantly (bug?)

I'm getting the job timeout error constantly in each worker with a random time after. I have seen the logs, there is no error, the pod it's just killed with no reason even having nothing set of timeout in the serverless endpoint ( I have seen it in live), seems that it's totally bugged. The software it's the same, nothing has been changed and I'm getting this issue all the time, even if I use 16gb or 48gb....
No description

Can we use any other container registery than docker hub to deploy on serverless?

Like I already have my images in aws container registery and azure as well, can we use them?

where should I put my 30GB of models?

I'm trying to use https://github.com/blib-la/runpod-worker-comfy to make a serverless endpoint with a customized Docker image. In my case, I have a dozen custom nodes, which were easy to install using the Dockerfile (RUN clone, RUN python install requirements). But I also have 30GB of additional models that my ComfyUI install needs. The README suggests 2 different methods for deploying your own models: (1) copying/downloading them directly into the image during build (2) creating a network volume that gets mounted at runtime. But what are the pros/cons of each approach? If I use a network volume, what are the speed implications? I'm just imagining trying to load 30GB on the fly over a home network -- it would take ages. On the other hand, if I design my workflows well, and ComfyUI keeps the models in memory, perhaps it's not that big of a deal? Also, how would I go about testing this locally? I'm assuming this is a well-documented task, but I'm not even sure what to Google for. I'm running Docker locally through WSL/Ubuntu. So far, I have been COPYing the 30GB of models into the docker image during the build process and pushing it to Docker Hub. Surprisingly, my 78GB image pushed to Docker Hub with no complaints, and it's currently deploying to Runpod Serverless. But it is taking AGES to deploy. This will significantly slow down my dev process, but presumably the actual performance will be faster? ...

Serverless vllm - lora

Is there a way to set the lora-modules (for the vllm docker container --lora-modules lora_adapter1=abc/efg) in the Template, or do i need to use the "standard" vllm container for it?

Job suddenly restarts and fails after one retry.

I am trying desperately to get our custom LoRA training using koha_ss running on your serverless workers. After training a few epochs it suddenly stops/restarts. I already tried to adjust adjust timeout value via UI and the request. Here is some basic info about the request and response. I can provide you further details and logs via DM if you need more insights. Request:...

how to make max tries to 0 in serverless handler or endpoint UI

my requests are processing twice even when the first response is correct
No description

Serverless instances die when concurrent

I have a limit of 5 workers. But when I run 3 or so in parallel, often 1-2 of them will die randomly. Doesn't always happen though, not easily reproducible. Is this due to resource constraints? Anyone else see it? What's the workaround?...

trying to attach network volume to CPU serverless worker

Hi it says there’s an issue atm when attaching network volume to cpu worker? Where can I track the status of this and is there any update? It says network volume has been temporarily disabled for cpu endpoints...

My serverless worker is stuck in initializing.

My worker was working fine 20 mins ago. I created a new release with minor change and it started getting stuck in Initializing. Any advice? Our users are unable to use critical part of the service. The endpoint is https://www.runpod.io/console/serverless/user/endpoint/3d8ketluy4q0pc...
No description

Callback Function

Is there a way to make a serverless endpoint callback and endpoint when the job ends?

Avoid automatically terminating tasks for CPU pods

I'm currently using CPU pods to host a simple website. However everytime I close the web terminal, all tasks gets automatically terminated, including the website. The CPU pod still appear active in the runpod console as bills me, but it is running 0 tasks and the website is down.

serverless error

I'm facing an error while using RunPod, and I could use some help figuring out what might be going wrong. Below is the error I received: { "endpointId": "98pu6z4wg5srca", "workerId": "mxwusxazi52e45", "level": "error",...

open ai retries the request

i am running llama 3.1 8b on the serverless vllm using 48gb pro config whenever my local api send the request to the server it s visible on the page that the request is in progress but during this time the open ai automatically retries the same request even if the existing request is in progress and this loop continues. when the first req which was made is completed the response is visible on the run pod dsahboard but the local api is still in the loop of retrying
No description