RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

Help Reducing Cold Start

Hi, I've been working with RunPod for a couple of months, it has been great. I know the image only downloads one time, I know you have two options for optimization, embedding the model on the docker image or having a network volume but with less flexibility since it will be located only on one region. I'm embedding my model on the docker image plus executing scripts to cache the loading, config or downloading. I'm using whisper-large-v3 model with my own code since it has a lot of optimizations. The cold start without any flashboot is between 15-45 seconds. My goal is to reduce this time as much as possible without depending on a high requests volume. ...

Is privileged mode possible?

I have an application that requires a kernel module be loaded. For an image to add a kernel modules requires privileged mode from the host. Is there anyway to get privileged mode enabled on my images so that I can add a kernel module to it?

Is there an easy way to take a python flask application as a serverless api hosting on Runpod??

I'm looking for a way to host a flask app, which worked great using ngrok on my local machine, as a serverless hosting (on-demand by api call) without having to change very much (unlike how AWS requires taking things apart and rebuilding to turn into lambda functions.) Is there a way to do this easily on Runpod?

Llama 3.1 via Ollama

You can now use the tutorial on running Ollama on serverless environments (https://docs.runpod.io/tutorials/serverless/cpu/run-ollama-inference) in combination with Llama 3.1. We have tested this with Llama 3.1 8B, using a network volume and a 24 GB GPU PRO. Please let us know if this setup also works with other weights and GPUs....
No description

Slow docker image download from GCP

Hi, I am experimenting with runpod recently. I tried to deploy a whisper image to runpod from my company GCP docker repo and I found it pretty slow. It look almost 10 minutes to download a 11GB size image. While I understand the image is huge, but I wonder is ther any things to do to speed up the process. For example, the repo location (current in asia, as my company is in asia)

Guide to deploy Llama 405B on Serverless?

Hi, can any experts on Serverless advice on how to deploy Llama 405B on Serverless?

How does the vLLM template provide an OAI route?

Hi, so the vLLM template provides an additional OAI compatible route. As I'm currently looking into making my own serverless template for exl2, I wondered how this was achieved as I currently don't see any description in the documentation about how to set it up and looking into the source doesn't seem to provide much more insight. If I check for job.get("openai_route") is that handled automatically or how would I go about adding it into the handler (or elsewhere)?

vllm

Any plans to update the vllm image worker? I would like to test Phi 3 and Llama 3.1, currently both are unsupported with the current image. (serverless)

Serverless worker failing - how do I stop it

I have a couple of questions. I use Runpod Serverless to power a ComfyUI API - it works well most of the time but today I noticed one of my serverless workers kept failing. The errors only occured with one of the workers, the others performed fine. Why would this be? and is there a way of terminating specific workers? also, how can I get notified if one of them is playing up? Thanks!...

Running Auto1111 getting - error creating container: cant create container; net

But it clears and eventually does run the item in the queue. I have network storage setup. Open to paid consultants on this. DM if interested.

Why "CUDA out of memory" Today ? Same image to generate portrait, yesterday is ok , today in not.

"delayTime": 133684, "error": "CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 23.68 GiB total capacity; 18.84 GiB already allocated; 1.47 GiB free; 20.46 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF", "executionTime": 45263, "id": "ae1e4066-e2b7-43c1-8f37-3525bda03893-e1",...

GPU memory issue

I have a question, is there anyone from runpod can dm me so we can talk about it and dive into it, thanks!

runpod IP for whitelisting for cloud storage

I have cloudinary account, and from runpod I want to download images from cloudinary but I want it also to be secure so what IP to whitelist so my cloudinary account only accepts request from my runpod serverless request.

how can I use javascript on worker code

how to do the handler file using javascript? is it possible? and if yes, what will be the equivalent of this(image attached) on javascript, I saw there is javascript sdk but on what I am seeing, I think its just for calling the endpoint? am i right? or I can also use runpod.serverless.start(), if yes what should I install and import?...
No description

Serverless Always IN_QUEUE?

Hi, I am pretty new to using Serverless in Runpod, and upon setting up my API and including my docker image with a Python script for making inferences (receiving and returning JSON), all the requests goes in and gets stuck at IN_QUEUE all the time. Testing my inference script locally does not cause any issues, is this an issue with the configuration?

Serverless doesn't scale

Endpoint id: cilhdgrs7rbzya I have some requests which requrie workers with 4 GTX 4090s. “max worker” of the endpoint is 150 and “Request Count” in Scale type is 1. When I sent 78 requests concurrently, only ~20% of these requests could start in 10s. P80 need to wait for ~600s. ...

Unused HPC power

Hi, We have many of the following machines at a DC with unused computing power: GPU: 1x NVIDIA A100 80GB CPU: 14 vCores (Epyc Milan) RAM:120 GB ECC...

connecting a telegram bot to a serverless pod

Hey guys, would love ur assist on the following issue, i have a serverless function and i have an api endpoint from runpod, to which i can access using curl when providing an autorization in the http header, the problem is when using /setWebhook when creating a telegram bot i need to provide an endpoint and i can't pass on parameters in the http request, ...

How to get worker to save multiple images to S3?

Hey all - my comfyui workflow is saving multiple images from throughout the workflow......however in the S3 upload, the worker is only saving one image - do you know how I can have it to save the multiple images into the same directory in S3?
No description

Using SSH to debug serverless endpoints

Hello! I had a quick question I was hoping someone could help with. In the RunPod documentation Overview section for Serverless, it states: Metrics and Debugging: Transparency is vital in debugging. RunPod provides access to GPU, CPU, Memory, and other metrics to help users understand their computational workloads. Full debugging capabilities for workers through logs and SSH are also available, with a web terminal for even easier access....
Solution:
Oh on serverless you just connect with the connect button when it's active
Next