Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

drained of my funds somehow. HELP??

hey guys i dont know who would be able to help me out here but i had set up a serverless endpoint with a custom template. all it does is generate a custom image when the user clicks to generate one. it runs me less than $0.20 a day, usually less. But on one particular day, I was charged me entire account funds ($24) and i truly dont know why that's happened. how could the worker be running all day? how didn't it time out? and also, im pretty sure it wasn't on my end because i have an idle timeout set to 5 minutes maximum so i truly don't know what's going on. can someone help me? attached is the screenshot of average usage + the time i was charged everything: it's funny because the day before, i reloaded funds (Nov 22, $25), and then the next day i was essentially drained of all my funds (Nov 23, a little more than $24)....
No description

vllm +openwebui

Hi guys, has anyone used Vllm as endpoint in OpenWebUI? I have created a serverless pod but it does not let me connect from openwebui (loaded locally). Does anyone know if I have to configure the external port and how it would be?

Has anyone experienced issues with serverless /run callbacks since December?

We've noticed that response bodies are empty when using /run endpoints with callbacks in the RunPod serverless environment (occurring sometime after December 2nd). Additional context: - /runsync endpoints are working normally - Response JSON format appears correct in the "Requests" tab of RunPod console under Status...

You do not have permission to perform this action.

client = OpenAI( api_key = RUNPOD_TOKEN, base_url = OPENAI_BASE_URL, ) ...

Not getting 100s of req/sec serving for Llama 3 70B models with default vLLM serverless template

I'm deploying Llama-70B models without quantization using 2x80GB workers but after 10 parallel requests the execution and delay time increases to 10-50sec. I'm not sure if I'm doing something wrong with my setup. I pretty much use the default setup with the vLLM template just setting MAX_MODEL_LEN to 4096 and ENFORCE_EAGER to true

CPU Availability in North America?

I spent all day trying to create a new CPU serverless endpoint. It kept getting stuck on "Initializing" for many minutes at a time. After spending a few hours digging through my Docker pipeline, I realized that the actual reason no workers were available is because I was attempting to stand up the servers in North America. When I picked the entire world, I saw that I could only get CPU servers in Romania and Iceland. Specifically EU-RO-1 and EUR-IS-1. That's understandable, I guess, but the Serverless » New Endpoint UI shows "High" availability of CPU3 and CPU5 workers across the board, even when narrowing it down to a single datacenter in the US. I learned to rely on that label when picking GPU workers for a different endpoint. Can you please confirm if my intuition is correct? And if so, perhaps you could improve the labeling in the UI to reflect the true availability of those workers?...

Serverless run time (CPU 100%)

So, i have a comfy UI workflow with a couple of custom nodes running. Most of the time my workflow takes about 6-8 minutes. The weird thing, 24GB or 80GB is only 1-2 minutes difference. ...
No description

Custom vLLM OpenAI compatible API

Hello, I'm running OpenAI compatble server using vLLM. In runpod for SERVERLESS service you cannot choose the endpoint you want to track the POST requests to, it's /run or /runsync by default, ny question is how do I either change the runpod configuration of this endpoint to /v1 (OpenAI endpoint) or how do I run the vLLM docker image so that it is compatible with the runpod?...

How to cache model download from HuggingFace - Tips?

Usin Serverless (48gb pro) w Flashboot. Want to optimize for fast cold start is there a guide somewhere? it does not seem to be caching the download - it's always re-downloading the model entirely (and slowly)...
No description

ComfyUI stops working when using always active workers

Hi. I know it's strange, but here it is. I have a workflow that works flawlessly when using serverless workers that are NOT always active. That is, if I set "always active" to 0 and max workers to 1 or 2 and it all works fine. For deployment, I put 1 worker as always active and 3 max workers. With this setup, (and exactly the same code as before), things stop working. The ComfyUI server starts but it looks like the endpoint never receives a request. If I set It back to set 0 always active workers, it works again. ...

is it possible to send request to a specific workerId in a serverless endpoint?

I need to have a custom logic to distribute requests to available workers in the serverless endpoint. Is there a way to send request to a specific worker using workerId?

Error response from daemon: --storage-opt is supported only for overlay over xfs with 'pquota' mount

Here are the request ids: e5307e07-7f0e-4b82-b668-7560a9b7ad4b-u1 9a65646e-1b26-4177-8262-59080c9d8e24-u1...

Polish TAX ID invoices

Hi, how can I correctly set Polish VAT ID, so I can get invoice when using one-time credit purchase? I do no see any option to set this ID during Stripe card checkout. I have this ID set in my profile options, is it sufficient for invoice generation?

How to cancel request

Here is my python code for running the request. ############# run_request = endpoint.run(input_payload)...

What is the normal network volume read speed? Is 3MB/s normal?

I've been seeing network speeds of less than 3MB/s in EU-SE-1 which makes things difficult.

Pods not getting started

Whenever my endpoint receives new requests and it autoscales to create new pods, a few of the pods get stuck while booting and don't respond. Also, while this happens I am being charged because somehow that is considered as uptime, certainly not a fault with my code and multiple other pods work fine on boot

First runs always fail

when using serverless api endpoint (comfyUI installed), the first run always fails even tho the following ones work fine. this is what the api returns on the first run:...

RunPod GPU Availability: Volume and Serverless Endpoint Compatibility

Hey everyone! Quick question about RunPod's GPU availability across different deployment types. I'm a bit confused about something: I created a volume in a data center where only a few GPU types were available. But when I'm setting up a serverless endpoint, I see I can select configs with up to 8 GPUs - including some that weren't available when I created my volume. Also noticed that GPU availability keeps fluctuating - sometimes showing low availability and sometimes none at all. So I'm wondering:...

How long does it normally take to get a response from your VLLM endpoints on RunPod?

Hello. I've tested a very tiny model (Qwen2.5-0.5B-Instruct) on the official RunPod VLLM image. But the job takes 30+ seconds each time - 99% of it is loading the engine and the model (counted as delay time), and the execution itself is under 1s. Flashboot is on. Is this normal or is there a setting or something else I should check to make the Flashboot kick in? How long do your models and endpoints normally take to return a response?

This server has recently suffered a network outage

This server has recently suffered a network outage and may have spotty network connectivity. We aim to restore connectivity soon, but you may have connection issues until it is resolved. You will not be charged during any network downtime.