Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Custom Handler Error Logging

I try to add errorlogging to my custom rp_handler.py and use the runpod library for that ``` from runpod.serverless.modules.rp_logger import RunPodLogger logger = RunPodLogger ...

Runpod Custom API request and rp_handler.py

I try to deploy a runpod worker with a network volume for ComfyUI. The handler should be able to process a minimal request and enhance it to a full-on comfyUI API request. Example Request:
prompt_text = {"img" : "someBase64", "positive_prompt" : "pos", "negative_prompt" : "neg", "flow_id" : 1 }
prompt_text = {"img" : "someBase64", "positive_prompt" : "pos", "negative_prompt" : "neg", "flow_id" : 1 }
this should be posted to the serverless endpoint. after this it should be going to the handler function as event and is processed there further. Unfortunately it get caught as Error by the get_job() function from the rp_job.py module of the runpod python library and throws a Error:
{"requestId": null, "message": "Job has missing field(s): input.", "level": "ERROR"}
{"requestId": null, "message": "Job has missing field(s): input.", "level": "ERROR"}
. Is there a way to implement my idea in this way or should i try a other way or skip the error handling in the get_job() function?...
Solution:
Payload needs to have everything within the input key. ``` { "input": {}...

Slow model loading

Hi all. I have a serverless endpoint designed to run Stable Diffusion inference. It's taking about 12 seconds to load the model (Realistic Vision) into the pipeline (using "StableDiffusionPipeline.from_pretrained") from a RunPod network drive. Is this normal? Is the load time mostly a function of (possibly slow) communications speed between the serverless instance and the network volume? The problem is that I'm loading other models as well, so even if I keep the endpoint active there is still a big delay before inference for a job can even begin, and then of course there's the time for inference itself. The total time is too long to provide a good customer experience. I love the idea of easy scaling using the serverless approach, and cost control, but if I can't improve the speed I may have to use a different approach. ...

Network Volume and GPU availability.

I am deploying automatic1111 as an endpoint . hosting the models on a network volume and then accessing them from the endpoint seems a good choice to avoid having an endpoint per model .However it looks like whenever I use a network volume the GPU availability displays either "low availability" or "not available ". Is there a specific region with a highly available 3090 , A5000 or 4090 GPU's ?
Solution:
Best bet for 24GB tier is RO but it's mostly L4.

Number of workers limit

I recently updated my number of workers in serverless to 10 and I see I can upgrade more depending on balance .My question is ,is there any limit to this ? I plan to deploy many models as endpoints (might reach 30-40 models in the future ) and would like to know if that would be supported on runpod
Solution:
You can increase it up to 30 yourself, after that you need to contact RunPod.

How do I estimate completion time (ETA) of a job request?

I want to show users when their images are going to be ready (e.g. "Your images will be ready in about 20 seconds. You are number 5 in the queue.") and it means I need to know the queue position of a job and average duration of jobs recently completed.

Does RunPod support setting priority for each job request?

I want to implement free/paid tier for my users. And I want them to basically use the same runpod workers. The only difference is that I want my paid users to be able to frontrun free users. Does runpod support a way for me to tag a job request as having higher priority than other requests? Or is there any tutorial/guide that suggests how I should implement this kind of feature with Runpod?...

serverless webhook support secret?

serverless webhook get status should set secret, otherwise anyone can change my db.
Solution:
you can use url query params

Queued serverless workers not running and getting charged for it?

I woke up this morning to find that all the credits in my Runpod account are gone. I don't have any active pods and only have a single network volume of 100GB. I didn't know why but noticed that there are 2 queued workers for one of my serverless endpoints. I was testing in Postman yesterday and sent a few requests, maybe like 10 in total. I had assumed that requests that didn't get a response after some time were automatically terminated....
Solution:
@Jack They cant tell if ur workers are just working or not. There isnt a runtime timeout bc u might for ex. actually he processing for that long - which is common for a use case like mine doing large video or audio processing. Recommendation is go through the process on a gpu pod in the future with ur handler.py and make sure works as expected there / then can monitor and send a request using the built in testing endpoint on runpod and monitor how its going with logs. With a gpu pod at least tho can see in a jupyter notebook if everything with ur handler.py logic is going as expected and can invoke it just calling the method normally...
No description

Is dynamically setting a minimum worker viable?

Wondering about: https://docs.runpod.io/docs/create-serverless-endpoint#modify-an-existing-serverless-endpoint Let's say that I have 5 throttled worker, and I dynamically set a minimum worker to 1 or 2? Does it kick off throttled workers and honors the minimum workers? ...

Issue with unresponsive workers

We've just launched our model to production a few days ago... and we've had this problem happen to us two times. Problem: Unresponsive workers, most of them are "ready" but are "idle" despite requests queuing up for MINUTES. Expected Behavior: Idle workers should respond as soon as a request is not yet taken from the queue. Actual Behavior: Workers stay idle, queue does not get processed and delayed for minutes....

Execution time much longer than delay time + actual time

Hello, I am running some tests with runpod and I can't seem to get the total execution time < 1 second. I made a dummy handler that just returns immediately. The first time the delay time is +2 seconds as expected as the container is not hot. The delay then drops to 100ms or so. But the round trip execution time is still +1 second. What is the extra overhead here? I've called the endpoint from two different machines on different networks and get the same results....

Advice on Creating Custom RunPod Template

Can anyone point me to a good tutorial for creating my own RunPod templates?

accelerate launch best --num_cpu_threads_per_process value ?

Hi guys, I try to do some lora training on a serverless endpoint and I wonder how many cpu cores are available with the different GPU types? Is there a specification on that somewhere? And / or what do you use? My first tests ran on a single thread but would love to maximize performance. 🙂
Solution:
You can use this environment variable:
RUNPOD_CPU_COUNT=6
RUNPOD_CPU_COUNT=6
...

Issue with Request Count Scale Type

Request Count is set to 15 and there are more than 15 requests but an additional worker is not being added. It's an A1111 worker (https://github.com/ashleykleynhans/runpod-worker-a1111) with runpod>=0.10.0. See screenshots attached.
No description

Do I need to keep Pod open after using it to setup serverless APIs for stable diffusion?

Hi I'm following this tutorial on building serverless endpoints for running txt2img with ControlNet - https://www.youtube.com/watch?v=gv6F9Vnd6io My question is - After deploying a Pod to set up the Network Volume to receive serverless endpoint requests, can I terminate the Pod and the disk attached to the Pod? Or do I have to keep the Pod running in order to receive serverless endpoint requests at any time?...
Solution:
You can terminate the pod when you are done.

how do you access the endpoint of a deployed llm on runpod webui and access it through Python?

how do you access the endpoint of a deployed llm on runpod webui and access it through Python?

Best Mixtral/LLaMA2 LLM for code-writing, inference, 24 to 48 GB?

Good evening all you experts! I'm past the pain and suffering stage and into the finesse and finishing stage - what is the best class of models for doing basic inference and in particular formulating simple commands based on a set of simple rules, and which will fit into a 24 GB (or 48 GB if much better) runpod?

Is runpod UI accurate when saying all workers are throttled?

To be honest, I cannot tell if the image I see is correct? I have two endpoints both with max 3 workers, and saying every GPU is throttled? I can't test right now, but why would it fall into this state / is it accurate? Worker Ids: ugv9p9kcxlmu1c 5snyuonk8vkisq...
No description

serverless: any way to figure out what gpu type a job ran on?

trying to get data on speeds across gpu types for our jobs, and i'm wondering if the api exposes this anywhere, and if not, what the best way to sort it out would be.