RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join
Nngagefreak055/3/2024

raise error

in serverless docker how can i raise an error at an api endpoint for runpod to work properly
BBadNoise5/3/2024

Serverless SD concurrent requests on multiple checkpoints

Hi, Do you know if there is a way to handle concurrent SD predictions (even 10 is fine) on different checkpoint with different prompt? For example I want to run 5 concurrent requests on checkpoint_1 and 5 on checkpoint_2, passing the checkpoint name in the request body....
JJas5/2/2024

Website glitching when trying to create pod - on Chrome and Brave

I am trying to create a new Community GPU pod, I clicked on templates, AVG pops up with warning, I close then try to "search" pods, immediately the screen crashes to white and loading circle. I tested it 3 times, the same happened every time, then opened my runpod account in Chrome and the white screen is still there! Not sure what to do...
Solution:
You create an exception for it. ( on your antivirus )
No description
Hhoumie5/1/2024

Which version of vLLM is installed on Serverless?

There is currently a bug on vLLM that causes Llama3 to not utilising the stop tokens correctly. This has been fixed in v0.4.1. https://github.com/vllm-project/vllm/issues/4180#issuecomment-2074017550 I was wondering what is the version of vLLM on the serverless. Thanks...
Hhoumie5/1/2024

When using vLLM on OpenAI endpoint, what is the point of runsync/run?

I just managed to create a flexible worker on serverless. It works great and I can do text completions via the openai/v1/completions endpoint. What I don't understand is the purpose of runsync and run. It's not like I'm queuing jobs somewhere to pick up the results later, right? openai endpoint returns the results straight away. And if I had too many users trying to use the openai/v1/completions, aditional workers will come to aid and get them access. So what's the point of the other endpoints? May someone is so kind and explain that to me? Maybe I'm missing something. Thank you...
Ssalbakraa5/1/2024

What is the CUDA version of the A6000 48GB endpoint?

I keep running into the following error randomly when I get requests where the worker is stuck in an infinite loop. ``` 2024-05-01T12:46:54Z error starting container: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.2, please update your driver to a newer version, or use an earlier cuda container: unknown...
RRaqqa5/1/2024

Efficient way to load the model

I'm migrating my service to RunPod and I need some advice on the best way to handle a 200MB model. Currently, I'm loading the model in the handler like this: ``` model_path = "src/model.pt" device = torch.device("cuda" if torch.cuda.is_available() else "cpu")...
Hhoumie5/1/2024

Can we run aphrodite-engine on Serverless?

aphrodite-engine is a fork from vLLM and also supports exl2 format, which gives it a huge advantage. Are there any plans to support aphrodite-engine in future on RunPod's serverless offering? I believe currently aphrodite-engine is only supported as a single server on RunPod. Thanks...
AAC_pill4/30/2024

Idle timeout not working

Hi team. I'm setting my serverless endpoint with a Idle timeout for 180 seconds, but it's idleing or sleeping back after the task is done. It was working before, and this is hard to debug, does anyone has any experience on that?...
Hhoumie4/30/2024

Is serverless cost per worker or per GPU?

I'm looking at serverless GPU options and when looking at 48 GB GPU it costs $0.00048/s. But is that per worker or per GPU?
For example if I set the max workers to 3, will I be charged 3 x $0.00048/s if all three are in use? That would get very quickly very expensive... Thanks...
Nngagefreak054/30/2024

openai compatible endpoint for custom serverless docker image

how can I get openai compatible endpoint for my custom docker image in runpod serverless. I am trying to create llama cpp docker image...
NNucleus4/30/2024

Securely using serverless endpoints on the client-side?

I have a use case where I'm doing a client-server webapp that uses serverless endpoints. In order to minimize latency I'd like to directly call to runpod on the client (web page) instead of having the response travel to the server and then to the client. Obviously I don't want to leak my API key. fal.ai solves this by letting you create temporary/single-use JWT tokens on the server-side. Then the client can directly talk to their endpoints for a very short timeframe. From what I've read, runpod does not even allow creating API keys on the fly, nor do they allow you to create a specific key for only one endpoint. Do you guys have any solutions for this?...
TTrueFaith4/30/2024

I wanna use comfyUI for a img2vid workflow - can I do this via the serverless service

I already tried setting up a pod yesterday where I uploaded all the needed models and stuff but today I cannot use it anymore since there are no GPUs that I can use. I also have everything set up locally and so the idea is to just use the GPUs via the serverless option. But I have no idea how this works and what is my best way forward?
Aagentpietrucha4/30/2024

Using network volume with serverless

I am running a stateless model within serverless to modify provided image. I am wondering if the network volume could be used instead of s3 to upload input and output files? Have somebody done anything similar? Could you share your experience and thoughts? PS Maybe somebody has implemented a tricky solution to improve the double upload/download performance? Currently I am S3 bucket for this, but I feel like there might be a better solution...
VVW4/30/2024

How to convert a template to serverless?

Hi, I've been using Runpod for a while, I have a lot of templates for serverless inference. However, since the latest updates I do not find the way to create more templates for serverless, it just have the option of Pod. Please, could you help me? this is causing a lot of delays on my work
Rribbit4/30/2024

How do I handle both streaming and non-streaming request in a serverless pod?

How can I handle both effectively? Is it okay to have a handler witht both yield and return? i.e. ```python def handler(endpoint): if endpoint == "stream_response": yield stream_response()...
HNHarish Natarajan4/30/2024

Runpod doesn't work with GCP artifact registyr

Runpod doesn't work with GCP artifact registry. I even copied the complete json key and added it as password but it is unable to authenticate. I want to pull images from artifact registry and use in runpod. Please help me in setting this up.
Hhoumie4/29/2024

Memory usage on serverless too high

I finally managed to get the serverless setup working.
I just sent a very simple post with a minimum prompt but it runs out of memory. I'm using this highly qualitised model which should fit into a 24GB GPU: Dracones/Midnight-Miqu-70B-v1.0_exl2_2.24bpw ...
Ttre3x4/29/2024

Does RunPod serverless handler support FastAPI?

I am trying to migrate an already existing FastAPI application running ML model to RunPod serverless. Does the serverless handler, that needs to be dockerized, support FastAPI?
Next