RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

serverless - lora from network storage

Hi, I have a flux generation serverless setup that's working pretty well. I bake all models into the docker image so even though the docker image size is pretty large, cold start is pretty reasonable and generations are fast enough. Now the issue arises with a new workflow where I will train more lora and need to set these available to the serverless workflow....

stuck in cue

i can seems to get 3.2 11 b vision to work on serveless settings, i am using the h100 sxm gpu with 2 active workers working, i have 5 total workers, but seems like most of then get unhealthy all the time. when i try tyo send a request, it is just stuck in que and never finished the job, ive had it running for 5 minutes before. what can i do?

Costs

I ran two serverless jobs. Each took about 30 seconds compute on a 16gb machine. There was a delay, and a cold start time for both. The total billing each was about 2 cents, in total 4 cents. ...

hey we have serverless endpoints but we have no workers for more than 12 hours now !

we have our serverless end points running but we have 0 workers joining what is happening ? is this normal ?

[Solved] EU-CZ Datacenter not visible in UI

I know the data centre is currently down. But the news about it being updated made me realize that I haven't seen this region both in pods and serverless for many months already. Which is quite unfortunate since I am based in Czechia. I thought RunPod removed it completely but now I see it's not. So why don't I see it at all?

Does Runpod serverless GPU's support NVIDIA MIG

Hello! I was wondering if anyone had any experience with setting up NVIDIA MIG (GPU partitioning) on runpod serverless? I'm currently trying to deploy a ~370 million parameters model onto serverless inference and we were trying to see if it would be possible to set up GPU partitioning on 1 worker to try and work around the serverless worker limitations. If any one has experience or even knows if Runpod support this would be much appreciated! thank you!

my serverless worker is downloading models to `/runpod-volume/.cache/huggingface` by itself

Hello, I don't use any network volume so I don't understand why /runpod-volume exist at all, but also I have a HF_HOME env var that point somewhere else and it seem huggingface is targeting /runpod-volume without explanation. Did I miss something ? Is that related to the new caching feature I was told about a few weeks ago ?...

Github Serverless building takes too much

hey guys when I tried to do serverless using github and Docker file , it already take 1 hour in this status "Waiting for Build" and there's no log errors, what the problem?

Websocket Connection to Serverless Failing

Hello. I'm updating a project to use websockets (moving from batch to realtime processing), however whenever I attempt to connect, I get a "websockets.exceptions.InvalidStatus: server rejected WebSocket connection: HTTP 404" error via wss://. It changes to a "HTTP 301" which redirects to https:// and also fails due to the invalid protocol if I use ws:// instead. I construct the URL as wss://<pod_id>-<port>.proxy.runpod.net/ws and expect this to be translated to wss://localhost:<port>/ws, and the websocket server is run in a thread just before the HTTP server is run. The latter works fine as I am able to communicate with it via the regular https://api.runpod.ai/v2/<pod_id> URL. The expected port is exposed in the Docker config, as per https://docs.runpod.io/pods/configuration/expose-ports. Any ideas what the issue is?...

Pulling from the wrong cache when multiple Dockerfiles in same GitHub repo

I seem to be having an issue where the wrong cache is being pulled by a worker when I have specified the Dockerfile in GitHub integration. Any help would be appreciated!

Severless confusion

hi, where does the openai compatible end point option on the severless confing? i just now using severless and i unable to connect it with my front end because they need OpenAI Compatible

How to pass parameters to deepseek r1

I have deployed deepseek r1 on serveless and then I don't know how I am going to pass parameters to it and what is the structure of its parameters. For example how do I tell the model max_tokens, what to write for the messages parameter.
No description

Job stuck in queue and workers are sitting idle

This has been the case very often. The jobs are stuck in the queue and workers are idle. How to improve this? There was not anything else going on with any other worker (or endpoint for that matter).
No description

Endpoint/webhook to automatically update docker image tags?

Is there a way to tell runpod there is a new docker image to update the endpoint to without doing it manually on the portal?

What is expected continuous delivery (CD) setup for serverless endpoints for private models?

Hello, our model artificats are stored in S3, what is the continuous delivery setup for serverless models not hosted on dockerhub? What I have seen so far: - Existing runpod workers download publicly available models and push them to dockerhub...

InvokeAI to Runpod serverless

Is it possible to link up InvokeAI to a serverless instance? I'm hoping to have it installed locally then use an external GPU, but not sure what is required to set that up. Wondering if there is a tutorial or something on doing so.

Comfyui From pod to serverless

Hi I've got my comfyui setup running on pod. Now which is the fastest way to make it work in serverless? I used a network volume.

Is serverless Network Volume MASSIVE lag fixed ? Is it now usable as a model store ?

Hi a while ago I tried to model store to avoid having to manage my now 100Gb docker image. But the runpod network volume took forever to load making 15 sec request take 1mn30 or more Support said they were working on a fix, is this now usable ?...

Serverless with network storage

Hi all, I am trying to setup a serverless worker for comfyui (currently using customized template from this https://github.com/blib-la/runpod-worker-comfy . I have a several large models which I would like not to bake into the image. I see there is an option to mount network storage to serverless worker, I tried to mount it (with the required models to run the workflow) to the serverless comfy worker, but when I send a request with the workflow I see in the worker logs that it does not see any of the models in the mounted storage....

Workers keep respawning and requests queue indefinetely

Hi there I tried asking in the "ask-ai" channel. but I need some more help. "I've just deployed a servless endpoint on 3 regions, when 1 worker gets to about 7 mins running, it goes to idle then spawns a new worker. over and over. Is this normal? its a small model and workers have been running now for a bout 35 mins? I tried a request but that just goes into a queue and doesnt get completed"...