Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

API Wrapper

curl -X POST https://api.runpod.ai/v2/stable-diffusion-v1/run \ -H 'Content-Type: application/json' \ -H 'Authorization: Bearer xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' \ -d '{"input": {"prompt": "a cute magical flying dog, fantasy art drawn by disney concept artists"}}' ...

Deploy from docker hub stuck

I have a basic even-odd container which takes the number input and reponds if it's even or odd, I have uploaded the container to docker hub: phmagic/runpod-test:latest. When I go to set up a new serverless pod, it asks for the container image and i put in phmagic/runpod-test:latest. All request hang for more than 400s, I can't seem to get even the basic example to work, documentation is very spotty about how to do this

Serverless on Active State behaviour

Some APIs I was using on serverless were working on active and idle state before, now it seems to break the server when I switch to active, the response is always the same as the one before, or only finished. I want to debug what is happening, can someone explain how the state work internally on the handler after it's awake? What will stay in memory? ...

LLM inference on serverless solution

Hi, need some suggestion on serving LLM model on serverless. I have several questions: 1. Is there any guide or example project I can follow so that can infer effectively on runpod serverless? 2. Is it recommended to use frameworks like TGI or vLLM with runpod? If so why? I'd like maximum control on the inference code so I have not tried any of those frameworks Thanks!...

Serverless Pricing

Is the delay time also included in the charges? Is there a way to know the total time the worker was operating, excluding the delay time and execution time? Because, I want to charge my customers for the total time they use my service....
Solution:
There isn't really an accurate way of determining cold start time + execution time automatically unfortunately. You have to look at the metrics for your endpoint and try to determine a base line.

Broken serverless worker - can't find GPU

Serverless worker qbw30nmknd6cmh is broken can't can't find the GPU. ```json { "dt":"2024-02-19 23:34:37.252459" "endpointid":"qbw30nmknd6cmh"...

How does multiple priorities for GPUs assign to me workers?

Wondering what is the algorithm behind selecting gpus when i have like 3 selected; and also if for example 4090s is my first priority even if it throttles like 7/10 of my workers it seems to keep it there. So i reassigned the priorities and reset my workers to see if I get a better distribution to not rely so heavily on 4090s but im wondering then what is the algorithm even doing with these priorities?...
No description

Runpod api npm doesn't work

I'm following https://doc.runpod.io/reference/health-check to call runpod with npm api package ``` const sdk = require('api')('@runpod/v1.0#18nw21lj8lwwiy'); sdk.healthCheck({endpoint_id: 'yy'})...

How do I expose my api key and use CORS instead?

I want to make it so that all requests from a domain to my serverless endpoint are allowed. I suppose I don't mind exposing my api key if I can make it so that only requests from a certain domain are allowed, right? How would I do this? I want to serve a Comfy workflow on a serverless endpoint and I think I can use https://github.com/blib-la/runpod-worker-comfy to set up the endpoint itself. It would be really helpful if a) someone could let me know if this is possible, and if so b) outline the general steps I need to do to accomplish it....

Worker Errors Out When Sending Simultaneous Requests

I was benchmarking a serverless endpoint by sending 10 simultaneous requests to the endpoint that has two active workers and one of the workers keeps errors out with the attached stack trace. After this error happens I get 9 requests that become stuck In Progress and if I terminate the errored out worker and spin up a new one I get the same stack trace unless I manually clear out the In Progress requests. This endpoint is using a Llama2 70B model with image runpod/worker-vllm:0.2.3...
Solution:
Figured my issue out. I needed MAX_CONCURRENCY set to 5, otherwise all requests were going only to one node.

Quick Deploy Serverless Endpoints with ControlNet?

Are there currently any quick deploy serverless endpoints with ControlNet? Or would it require a custom docker image?

Mixtral Possible?

Wondering if it's possible to run AWQ mixtral on serverless with good speed

Estimated time comparison - Comfy UI

Hi everyone, I've been looking at the various different GPU options for serverless and I am trying to see if anyone has a rough estimate of how many times each GPU would be faster/slower and if this is even possible to calculate. There is no actual formula obviously but I am wondering if someone has similar experiences. In my case, it takes around 355 seconds to run my workflow on my local machine (GTX 3080 Ti)....

Any plans to add other inference engine?

Hi I'm using vllm worker now but when it comes to quantized models vllm works poorly. Too many vram usage, slow inference, poor output quality, etc.. So, is there any plans to add other engines like tgi, exl2?...

Serverless scaling

I'm considering using runpod for commercial use. I need reliable, relatively cheap scaling for this to work, but ive heard that at least a few months ago serverless was very unreliable, i.e. not allocating GPUs for hours or days at a time. I don't want to have to figure out how to deploy on runpod just to realize that it's unreliable. What is your take on this right now? Any evidence that these problems have been fixed?

"Failed to return job results. | 400, message='Bad Request', url=URL('https://api.runpod.ai/v2/gg3lo

{5 items "dt":"2024-02-19 02:45:23.347011" "endpointid":"gg3lo31p6vvlb0" "level":"error" "message":"Failed to return job results. | 400, message='Bad Request', url=URL('https://api.runpod.ai/v2/gg3lo31p6vvlb0/job-done/3plkb7uehbwit0/83aac4d7-36c5-45ce-8b43-8189a65a855f-u1?gpu=NVIDIA+L40&isStream=false')"...

Stable Diffusion API Execution Time

I am posting this for a response from runpod support @flash-singh or anyone other than @justin Is 30+ seconds execution time on a serverless 24GB GPU, via any API Docker 1111, for 768px image acceptable/expected? The exact same model/prompt/settings runs on a pod using the A1111 UI in 3 seconds. Why is serverless so much slower? This is regarding execution time -- not delay, queue, spinup......

Serverless Unable to SSH / Use Jupyter Notebook Anymore

When I used to use Runpod when I first started, if I had an active worker, I could ssh / use a jupyter notebook if I had ssh open / notebook launched on the pod. But now when I try to ssh, it just throws me an error: ``` Justins-MBP ~ % ssh m3k8sad75isko8-64410faa@ssh.runpod.io -i ~/.ssh/id_ed25519...
No description