Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Endpoint stuck in init

Hi! My serverless endpoint has been initializing for many hours now and I haven't changed anything! Been working for the past month. 🤔 Any ideas?...
No description

Bug in cancellation

I had to manually cancel this request. why is it not cancelled automatically when Execution Timeout is 300 seconds
Solution:
In this case, I would suggest logging a support ticket on the website and you can also request a refund.
No description

Where is the "input" field on the webhooks?

Last year, around October, I was using webhooks to get notifications of async jobs. The "input" field was included on the webhook payload, along with the "output", "status" and other fields that I don't remember know. I find today that the "input" is no longer included in the webhook payload? Is this documented somewhere? Thanks....

Issue loading a heavy-ish (HuggingFaceM4/idefics2-8b) model on serverless (slow network?)

Hey there, I'm trying to load the https://huggingface.co/HuggingFaceM4/idefics2-8b model into a serverless worker but i'm running into an issue. I'm loading the model outside the handler function like so:...

Network bandwidth changes?

I have been running multiple models for a while now. But in the past few weeks, I noticed a big change in latency. After investigation I found that the networks speed of my serverless workers was very slow (a few MB/s at most) making my upload/downloads longer thus causing the latency. Has there been any changes to the network bandwidth allocations for serverless workers in the past few weeks? Is there information about the current bandwidth available for serverless workers anywhere ?...

GGUF in serverless vLLM

How do I run a GGUF quantized model? I need to run this LLM: https://huggingface.co/mradermacher/OpenBioLLM-Llama3-70B-GGUF What parameters should I specify? ...

hanging after 500 concurrent requests

Hi, I loaded llama 8b in serverless with 1 active worker A100, and 1 idle worker, I wanted to benchmark how many requests I can do at the same time so I can go production. But when I send 500 requests at the same the server just hangs and I don't get an error. What could be the issue? how to know how much load 1 gpu can handle and how to optmize it for max concurrency.

is anyone experiencing a massive delay time when sending jobs to GPUs on serverless?

We are sending jobs off to our whisper serverless functions and experiencing massive delay times sometimes and sometimes it just goes through quickly? At the moment we are just testing so we are using a single 16GB GPU? Has anyone got any advice on this?

Urgent! all our workers not working! Any network issues?

Please take a look at our workers in endpoint h16kk1hi79s3t0 or kn0n8ry69jj1t7 All the workers are stuck at something!!...

Send Binary Image with Runpods Serverless

Is it possible to send binary images on Runpods Serverless? From what I can see, you can only send application/json type, so I'm forced to convert my images to Base64, which isn't optimal. Is there a way to send the binary image directly?...

New release will re-pull the entire image.

It was working in the past, pulling only new layers on the top. But now it is pulling everything again. Slow to do testing.

Requests stuck in IN_QUEUE status

We deployed a LLaVA-v1.6-34B model on 2xA100SXM infra as a serverless endpoint. When we send a request, we don't get a response. And the request is indefinitely in the IN_QUEUE status. Any suggestions for what we should we look at to start debugging this? We've previously been successful deploying LLaVA-v1.5-13b. But again grateful for suggestions...
No description

"Failed to return job results" and 400 bad request with known good code

I've been trying to get a serverless endpoint working with a Stable Diffusion script. When I test locally (or with the same hardware on pods) with --rp_serve_api or --test_input, it works perfectly fine. I can also use the same functions in jupyter or a bare python script and it works as expected. But when I deploy the same code to serverless, I get (...) {"requestId": "(...)", "message": "Failed to return job results. | 400, message='Bad Request', url=URL('(...)')", "level": "ERROR"} with...

How to schedule active workers?

e.g., I want 0 active workers from 8pm to 3am, and 1 active worker from 3am to 8pm.

CUDA env error

error log: 2024-05-27T14:08:55.663521063Z RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. 2024-05-27T14:08:55.902287850Z --- Starting Serverless Worker | Version 1.5.0 --- I'm using comfyui , so I'll start the comfyui service before runpod.serverless.start, and the problem occurs occasionally, here is my code...

Failed to return job results

Does any brother know what this mistake is? Failed to return job results. | 400, message='Bad Request', url=URL('https://api.runpod.ai/v2/nbv9wne1ci0jzc/job-done/7mt8j2tdv4qv9o/b6b6cb8d-010f-44f2-9fb8?gpu=NVIDIA+RTX+A4500&is Stream=false

Clone endpoint failing in UI

```{ "errors": [ { "message": "Something went wrong. Please try again later or contact support.", "locations": [...

Is there any limit on how many environment variables can be added per container?

It seems i can't add any more environment variables on my serverless. the button is greyd out disabled. Is there any other way?

how to host 20gb models + fastapi code on serverless

I have 20gb model files and a fastapi pipeline code to perform preprocessing and inference+ training. How can I use runpods serverless?...

Need help putting 23 GB .pt file in serverless enviornment

I have a 23 GB .pt file containing tensors for 36 attention processors for each step, cannot reduce size I need to somehow put this one on the serverless environment to use in the inference, I am getting no space left on device error when bulding the docker image (I understand it should be small, but i don't have an option) Can someone please help...