RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Reporting/blacklisting poorly performing workers

I've noticed that every now and then a bad worker is spawned for my endpoint which takes forever to complete the job when compared to other workers running the same job. Typically my job would take ~40s but there are occassionally workers that have the same gpu but take 70s instead. I want to blacklist these pods from running my endpoint so performance isnt impacted

Flashboot not working

I have flashboot enabled on my worker, but it appears all of them are running off a cold boot, every time for some reason.
No description

How can I make a single worker handle multiple requests concurrently before starting the next worker

Hi everyone, I’ve deployed an image generation model using a 24GB GPU with 2 workers (1 active) on RunPod. Each image generation uses around 6-7GB of memory. My goal is to have a single worker handle multiple requests concurrently until it can’t handle the load anymore, and only then should the second worker start. Right now, when I send two requests, the second worker starts immediately to process the second request, even though my first worker should have enough resources left to handle both requests at once....

Flux.1 Schnell Serverless Speeds

What sort of speeds are people getting with their Flux.1 Schnell models using Serverless in RunPod? I'm currently hitting 30 seconds for 4 images with a significant amount of time moving the model to cuda (~15 seconds). Is there anyway to speed this up? (48GB GPU Pro)

Job timeout constantly (bug?)

I'm getting the job timeout error constantly in each worker with a random time after. I have seen the logs, there is no error, the pod it's just killed with no reason even having nothing set of timeout in the serverless endpoint ( I have seen it in live), seems that it's totally bugged. The software it's the same, nothing has been changed and I'm getting this issue all the time, even if I use 16gb or 48gb....
No description

Can we use any other container registery than docker hub to deploy on serverless?

Like I already have my images in aws container registery and azure as well, can we use them?

where should I put my 30GB of models?

I'm trying to use https://github.com/blib-la/runpod-worker-comfy to make a serverless endpoint with a customized Docker image. In my case, I have a dozen custom nodes, which were easy to install using the Dockerfile (RUN clone, RUN python install requirements). But I also have 30GB of additional models that my ComfyUI install needs. The README suggests 2 different methods for deploying your own models: (1) copying/downloading them directly into the image during build (2) creating a network volume that gets mounted at runtime. But what are the pros/cons of each approach? If I use a network volume, what are the speed implications? I'm just imagining trying to load 30GB on the fly over a home network -- it would take ages. On the other hand, if I design my workflows well, and ComfyUI keeps the models in memory, perhaps it's not that big of a deal? Also, how would I go about testing this locally? I'm assuming this is a well-documented task, but I'm not even sure what to Google for. I'm running Docker locally through WSL/Ubuntu. So far, I have been COPYing the 30GB of models into the docker image during the build process and pushing it to Docker Hub. Surprisingly, my 78GB image pushed to Docker Hub with no complaints, and it's currently deploying to Runpod Serverless. But it is taking AGES to deploy. This will significantly slow down my dev process, but presumably the actual performance will be faster? ...

Serverless vllm - lora

Is there a way to set the lora-modules (for the vllm docker container --lora-modules lora_adapter1=abc/efg) in the Template, or do i need to use the "standard" vllm container for it?

Job suddenly restarts and fails after one retry.

I am trying desperately to get our custom LoRA training using koha_ss running on your serverless workers. After training a few epochs it suddenly stops/restarts. I already tried to adjust adjust timeout value via UI and the request. Here is some basic info about the request and response. I can provide you further details and logs via DM if you need more insights. Request:...

how to make max tries to 0 in serverless handler or endpoint UI

my requests are processing twice even when the first response is correct
No description

Serverless instances die when concurrent

I have a limit of 5 workers. But when I run 3 or so in parallel, often 1-2 of them will die randomly. Doesn't always happen though, not easily reproducible. Is this due to resource constraints? Anyone else see it? What's the workaround?...

trying to attach network volume to CPU serverless worker

Hi it says there’s an issue atm when attaching network volume to cpu worker? Where can I track the status of this and is there any update? It says network volume has been temporarily disabled for cpu endpoints...

My serverless worker is stuck in initializing.

My worker was working fine 20 mins ago. I created a new release with minor change and it started getting stuck in Initializing. Any advice? Our users are unable to use critical part of the service. The endpoint is https://www.runpod.io/console/serverless/user/endpoint/3d8ketluy4q0pc...
No description

Callback Function

Is there a way to make a serverless endpoint callback and endpoint when the job ends?

Avoid automatically terminating tasks for CPU pods

I'm currently using CPU pods to host a simple website. However everytime I close the web terminal, all tasks gets automatically terminated, including the website. The CPU pod still appear active in the runpod console as bills me, but it is running 0 tasks and the website is down.

serverless error

I'm facing an error while using RunPod, and I could use some help figuring out what might be going wrong. Below is the error I received: { "endpointId": "98pu6z4wg5srca", "workerId": "mxwusxazi52e45", "level": "error",...

open ai retries the request

i am running llama 3.1 8b on the serverless vllm using 48gb pro config whenever my local api send the request to the server it s visible on the page that the request is in progress but during this time the open ai automatically retries the same request even if the existing request is in progress and this loop continues. when the first req which was made is completed the response is visible on the run pod dsahboard but the local api is still in the loop of retrying
No description

Stuck IN_PROGRESS but job completed and worker exited

``` { "delayTime": 11461, "executionTime": 35548, "id": "8612f7b4-df33-4be9-8ce6-1a82b7283b24-e1",...
No description

Issue while running FastAPI TTS(Text To Speech) Docker Image on Serverless

We have made a TTS model that converts text input into an audio file. To serve this model, we created a REST API using the FastAPI framework with multiple endpoints, each performing specific tasks based on the text input. After building the API, we created a Dockerfile, pushed the image to Docker Hub, and attempted to run it on a serverless platform. However, while the container works perfectly on a local setup, it fails to run on the serverless environment. Could you assist with how to successfully run the RestAPI container on a serverless platform, and once the container is running, how to access the FastAPI endpoints to send text input and get audio responses? Docker File: ...

My output is restricted to no of tokens

I have deployed llama 3.1 8b on serverless Vllm when i hit the req the response is always in limited no of tokens help me with this
No description