Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

ControlNet does not seem to work on Serverless API

Hey guys, before everything, I'm not a native English speaker so sorry in advance if I got something wrong. I'm just about 3 days in trying to find someone to help me on this issue. I'm trying to run a serverless API to an automatic1111's WebUI interface via this guide: https://github.com/ashleykleynhans/runpod-worker-a1111 ...

image deprecated?

When i put all the variables in (trying with phi-4k-mini) it says image is deprecated
No description

Lora modules with basic vLLM serverless

Is it possible to use lora modules with default vLLM endpoint ? If not, how is it possible to do it quickly ?

runpod js-sdk endpoint.run(inputPayload, timeout); timeout not work

when I set itmeout 360000, it always return error "timeout of 3000ms exceeded". Even I open index.ts file in runpod-js-sdk,and set async run(timeout: number = 360000), it sometimes works, sometimes it doesn't , return error "timeout of 3000ms exceeded". I have install ramda in dependencies and @types/ramda in devDependencies. (Before installation,it always return error "timeout of 3000ms exceeded" )...

Faster Whisper Endpoint Does Not Work With Base64?

I am using the same base64 string with some other service for whisper its working fine. param: audio_base64 Error:...

Issues in SE region causing a massive amount of jobs to be retried

The issues in the screenshot are causing 10% of my jobs to be retried in SE region. Please fix this, its not happening in CA region.
No description

GPU for 13B language model

Just wanted to get your recommendations on GPU choice for running a 13B language model with a quantization in AWQ or GPTQ? Workload would be around 200-300 requests / hour. I tried a 48 GB A6000 with pretty good results but I was wondering if you think 24 GB GPU could be up to the task?

"job id does not exist" error on Faster whisper

I have been getting a "job id does not exist" error on Faster whisper. I ran a 55minute audio file and got an ID for tracking but each time I try to use the ID to check the status endpoint, i get the job id error. I noticed i have already been charged for this run. I raised a ticket yesterday morning. Got some initial responses from your team but not heard anything in almost 36 hours....

Mixed Delay Times

Hey, what could be the reason for these delay times?
No description

Question on Flash Boot

Hello. I'm aware Flash Boot is more or less a "caching system" that keeps a worker on stand-by for some time, preventing large delay times. For example, my first request of the day takes from 8s to 15s of delay time, and subsequent requests have much faster delay times even with 5s idle timeout -- which I guess comes from the flash boot being enabled. I know that Flash Boot will stop working after an X amount of time has passed with no requests in the "cached worker"; in this case my new request after this X amount of time will take the "8s to 15s" delay time....

OutOfMemory

why my tasks keeps on failing with out of memory I'm just running large-v2 on faster-whisper on a 4090 GPU...

timeout in javascript sdk not work

const runRequest = await endpoint.run(inputPayload, 600000); An error occurred: AxiosError: timeout of 3000ms exceeded. It seems that the timeout setting is not taking effect. this is the run function in js sdk....

Unstable speed of processing between different wroker.

Hi! I'm deploying serverless for model SadTalker on endpoint with specs 24GB GPU Pro. And I tested some requests and realized that amount of processing time with the same request on different workers are huge difference. Here are 2 log files: 1 - Log of slower worker: it take 45s executionTime. spead of iteration is 2.09s/it at Face Render process 2- Log of normal worker: it take 21s executionTime. Speed of iteration is approximate 1.30 it/s at Face Render process My endpoints is:schx1xwzhn1lhk Could anyone help me to debug and prevent this issue?...

OSError: [Errno 9] Bad file descriptor on all requests

This error has started appearing on every request. The 'Requests' tab doesn't even get updated with the request, it errors out before the request is recognised. I'm not sure how to debug OS issues on your servers...

are there any published information on 'up-time' - or tips on thinking of SLA type?

Basicly title, how to approach this? tips? writings? blogs? help a guy out

Clarification on Billed Volume Calculations for Serverless and Network Storage

I am reviewing my recent billing statements and I have noticed several entries under ‘Serverless’ and ‘Network Storage’ categories that I need some clarification on. I am using a serverless architecture for my project and it is crucial for me to understand the billing details for proper budget management. Could you please help me understand: 1. How is the ‘Billed Volume’ calculated for Serverless storage under the ‘Storage’ tab? What activities or data usage contribute to the volume reported?...

Plans to support 400B models like llama 3?

Is runpod thinking about how they will support vvllms like 400B Llama model that is expected to release later this year?

How do i retry worker task in runpod serverless?

Good day, I was moving a worker from pods to serverless. Previously i used azure service bus to send task to my pod. And the service bus message had retry count of 5. But after migrating to the serverless endpoint i didn't find to integrate any message queue system to deliver the task to the worker container. So when it fails it only responses with errors. How i can make it to retry the same request? i didn't find anything in the documentation.
I took a look into these documents: https://docs.runpod.io/serverless/workers/handlers/handler-error-handling There is something with refresh worker at https://docs.runpod.io/serverless/workers/handlers/handler-additional-controls#refresh-worker...

Speed up cold start on large models

I'm trying to do some intermittent testing on a 70B LLM model, but any time a vLLM worker does a cold start, it downloads the model from HF. This takes about 19 minutes, so costs add up and the requests made to the API time out and fail. Once the model is loaded, things are fine with inference running in 12-15 seconds. Is there any good solution to work with this larger model without keeping a constant worker, which would defeat the whole purpose of running it on serverless for the very intermit...

How to get "system log" in serverless

For normal GPU instances, I can see both "pod log" and "system log", but for serverless I can only see "pod log" if I am correct. I don't know if the image pull takes too long and can't really debug a failed to start instance