Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Execution time discrepancy

I built a custom text embedding worker, when i time the request on the pod it takes about 20ms to process from start to finish. The request takes a lot longer (about 1.5 seconds), and runpod returns an executionTime: 1088ms in the response object. do you know where this discrepancy might come from? As it is, it's currently really limiting the throughput of my worker, and there isn't much point in using a GPU if it's so heavily bottlenecked....

Speed

Hey guys just wondering if your runpod serverless speeds were good. Im running llama 3.1 8B on 16gb vram

Understanding RunPod Serverless Pods: Job Execution and Resources Allocation

I'm new to RunPod and need clarification on how serverless pods work. Here's my understanding: - RunPod serverless pods allow code to run when triggered, eliminating idle costs. - Code is executed as a job by a worker, accessed through an endpoint. - I can specify the number of jobs a worker can run....

How to force /runsync over 60 secs

Need to keep /runsync alive for over 60 seconds. No webhooks, async. Just want the /runsync to work as is just for longer exection times.

CORS issues

Access to XMLHttpRequest at 'https://api.runpod.ai/v2/bsy98fzdbod86f/run' from origin 'https://**********prod.web.app/' has been blocked by CORS policy: Request header field access-control-allow-origin is not allowed by Access-Control-Allow-Headers in preflight response. Any solutions for this?...

Sync endpoint returns prematurely

Sync endpoint sometimes randomly (about half of the time) responds prematurely with in progress json. The job finished however, I need the sync not to respond until the job is done.

Is it possible to see logs of a historical job ID?

I’ve had a user mention that their image didn’t process due to a processing error so I would like to see the logs leading up to the error. I have the job ID (from a day ago) can you advise how I can see the worker logs for that particular job in Runpod? FWIW: the job ID is 58f1b0ce-d4de-4711-b58a-1c42bb3d5017-u1...

Implement RAG with vllm API

Is it possible to implement RAG with the given API of vllm and our deployed model on the serverless endpoint.

How to deploy flux.schnell to serveless?

Title says it all Would be nice to have a guide on how to setup flux on a serverless endpoint also, I'm planning to train some loras and store them for future use ...

When ttl is not specificed in policy, one gets 500 with {"error":"ttl must be \u003e= 10,000 ms"}

Since ~40 min all my requests with 'executionTimeout': 120000 get this error with HTTP status 500. here is my repro curl -v -X POST "https://api.runpod.ai/v1/XXX/run" -H "Authorization: Bearer XXX" -H "Content-Type: application/json" -d '{"input": {"XXX": "XXX"}, "policy": {"executionTimeout": 120000}}'...

Pushing a new release to my existing endpoint takes too long

I've pushed a new release to my endpoint with a new Docker tag image. This tag only modifies some app code, so all the heavy docker layers should already be there. The system logs show "Pulling fs layer" for most of the layers except the first 5. Isn't RunPod caching the layers somewhere or does it have to pull ALL layers everytime I push a new release even though only the last layer has changed...?...

Serverless worker doesn't run asynchronously until I request its status in local development

I'm following the docs and created a verey simple handler.py with the following: ``` import runpod ...

increase workers

i have a requirement of 50 API calls at a time. Currently it is only 5 workers in serverless endpoint, its taking too long as each API response time is around 25 sec. Any solutions? anyone from the team please reach out. Thank you!...

Do I need to base my serverless worker image from the official base image?

I have my own Dockerfile already optimized with only the things I need to perform the inference. Runpod docs says that we should start by forking the worker-template, but basing from it I end up with a HUGE image. Is there anything special in the runpod/base image or can I just use my own and simply make sure I'm installing runpod in Python and exposing a handler function with CMD at the end?...

Why my docker image used for my serverless endpoint is not updating?

Hi team, I pushed a new version of docker image to my personal docker hub, and I want to update my serverless endpoint to use my latest docker image. What I did was that I clicked the new release in my endpoint setting, but it is not working for me. My runpod endpoint shows no sign of updating. Can anyone help?...
No description

worker keeps dying while training a lora model

even after setting the worker to be active, it keeps dying after like 2 minutes. is there a way to prevent this?

Long latencies

I have a 7B model that is supposed to be very fast (it checks if a claim is supported by a context, and gives a yes/no answer). If I rent a H100, I can process my prompt and get a response in 100ms (for a prompt that's about 1400 words). But a very short prompt (about 200 words) when using serverless takes about 1.3 to 1.5 seconds. I tried to have "active workers" but that didn't help. Any tips on how to reduce the latency?...

Edit endpoint with new docker image

Is it possible to update the deployed endpoint with new docker image linked to template?

Running a specific Model Revision on Serverless Worker VLLM

How do I specify the model revision on serverless? I was looking through the readme in https://github.com/runpod-workers/worker-vllm and I see I can build a docker image with the revision I want, but is that the only way to go about this? Specifically, I wanna setup this huggingface model: https://huggingface.co/anthracite-org/magnum-v2-123b-exl2 edit: fixed the model link...