Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

How many serverless-GPUs can be scaled maxed?

We are an AI product company and we are considering migrating to the Runpod platform. I would like to know the number of various graphics cards on the platform and the maximum number of cards that can be started? We mainly use 4090 and A100 graphics cards. The platform we rented before had 190 machines at most at the same time. Now that our business is growing, we hope to be able to open more machines at the same time....

SGLang

SGLang works very well in pod but impossible to run in serverless. the api route stay => error 404 i use the exact same config (docker, command line, port) in pod and serverless....

Job has missing field(s): input

I try to use the response on a serverless LLM meta-llama/Meta-Llama-3.1-8B-Instruct with the following JSON: {...

meta-llama/Meta-Llama-3-8B-Instruct serverless

I am bit confused, trying to get this tested using Python but it seems to point me to using openai in the tutorial @ https://docs.runpod.io/serverless/workers/vllm/get-started Can we still use the openai python library or we need to use another one to connect to the endpoint? Can anyone help me please?...

With LLM on runpod is there a cost like other providers like tokens and if its serverless

Hi we want to run a LLM on runpod but I am concerned about running serverless as its pretty slow and we need the LLM to be pretty much instant? The other thing is we dont want to run a GPU all the time as it ends up costing a lot? Can someone out there give me some advice please?

LLAMA 3.1 8B Model Cold Start and Delay time very long

Hey, our cold start time always reaches over a minute and same with delay. For live running we need this to be quicker. We have tried with network volume as well but it doesnt change anything.
No description

Run task on worker creation

Is it possible to make a serverless worker run a custom script on the first startup before its considered "ready"?

I got time variation in serverless workers, I don't know but every worker used RTX 4090

Hey, My every worker in idle mod and still got time difference in the complete process.

Best tips for lowering SDXL text2image API startup latency?

I'm currently using the https://github.com/ashleykleynhans/runpod-worker-a1111 along with a network volume. I only use a single model with the sd text2image endpoint and I don't need the UI. Right now, I'm experiencing an 80+ second delay for cold startups on the first request. Do you have any suggestions on how to optimize this (without 1 constant active worker)? Thanks in advance!

Serverless is showing inaccurate inProgress

My serverless endpoint is showing inaccurate inProgress while it is processing multiple jobs (in IN_PROGRESS status as the tasks are long running). This is affecting the scaling strategy.
No description

Avoid model download on docker build

Hi, im building a docker image following blib-la's comfyui runpod-worker-comfy and I have to indicate multiple models to download in the Dockerfile. The problem is this takes ages to download and then upload the full image from my PC. Is there a way of having the models download only when I deploy the docker in Runpod. I already tried moving the models download to start.sh, but this will cause the models to download every time a new worker is called....

Is there any serverless template, or vLLM compatible HF repo for Vision models?

Hi! Are there any plug and play LLAVA serverless templates, or LLAMA 3 (or other) vision models that work with Runpod vLLM? I was using Ashleyk's awesome runpod-worker-llava, but it has been removed since....

More RAM for endpoints?

Just curious about whether we are able to manually assgin more RAM for our endpoints since I want to use 4090 due to its high inference performance while the RAM is just 24GB which could be a bit low for video combine process

Serverless container storage

I made a script that downloads various models and packages and incorporated it in the docker image that is being run on serverless endpoint, i want it to run only once and download all the packages and store them for use, do i have to use network volume storage or is there another way to store the packages in container storage so they do not download every time the container is removed and started again?

Is RunPod's CPU Endpoint an alternative to GCP's Cloud Run?

Is RunPod's CPU Endpoint an alternative to GCP's Cloud Run?

Using the vLLM RunPod worker image and the OpenAI endpoints, how can I get the executionTime?

The standard endpoint provides executionTime as well as an ID that points to an execution that I can use /status on: ``` { "delayTime": 598, "executionTime": 1276,...

Any limits on execution timeout?

If I uncheck Enable Execution Timeout on an endpoint can I run a serverless worker indefinitely? If not what is the actual limit? Thanks!

prod

Hi guys we are having a problem with our serverless enviornment - what is the best way to get this resolved?
No description