RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Disk size when building a github repository as an image on Serverless

I have a question about the disk size when building a github repository as an image on Serverless. Does the option to set the disk size in the serverless settings affect the computer that builds it? For example, if the image being built is about 17GB in size and the computer needs 65GB of storage to build it, should I set the disk size to >17GB or >65GB?

How to get progress updates from Runpod?

Hi all - my goal is to get progress updates from a job request - presently I'm polling a job status request every two seconds, and I would like to get feedback of the % completed. Looking through the documentation, I'm updated the handler function in rp_handler.py by adding the following code: ```...

How can I use Multiprocessing in Serverless ?

Hi I am trying to do something somewhat simple ```def run(self): print("TRAINER: Starting training") train = Train() trainer = self.ctx.Process(target=train.train, args=(self.config.config_path,))...

Can't make serverless endpoints from GHCR container with new Runpod website update

I notice a new UI update released tonight(?) When I go to create a serverless endpoint, I no longer have a choice to im no longer to use images impages on my private GHCR. Is this intentional?...
No description

Serverless vllm running but still downloading?

Title says it all and this shouldn't happen
No description

Can anyone help me deploy a qwen/qwq-32B-Preview model from huggingface with vllm serverless

I'm having issues with configurations. I used 1 gpu of 80gb with container image : runpod/worker-v1-vllm:stable-cuda12.1.0. and had set the dtype as bfloat16, but the model is giving rubbish outputs....

New vllm Serverless interface issue

Hi guys, I logged in early to run my vllm-worker, which have been worker perfectly before, but I noticed that the interface for serverless have changed. I noticed there's no openai compatible url anymore. My codes were also experiencing internal server errors. Would appreciate it if you could share fixes to this issue. I'm not sure if this page is updated according to the new interface: https://docs.runpod.io/serverless/workers/vllm/openai-compatibility
No description

With new pre-built serverless images how do we learn the API schema?

I see we can now select from some pre-built images for serverless. How can we learn the API schema for the input for these pre-built images? Thanks! 🙂

drained of my funds somehow. HELP??

hey guys i dont know who would be able to help me out here but i had set up a serverless endpoint with a custom template. all it does is generate a custom image when the user clicks to generate one. it runs me less than $0.20 a day, usually less. But on one particular day, I was charged me entire account funds ($24) and i truly dont know why that's happened. how could the worker be running all day? how didn't it time out? and also, im pretty sure it wasn't on my end because i have an idle timeout set to 5 minutes maximum so i truly don't know what's going on. can someone help me? attached is the screenshot of average usage + the time i was charged everything: it's funny because the day before, i reloaded funds (Nov 22, $25), and then the next day i was essentially drained of all my funds (Nov 23, a little more than $24)....
No description

vllm +openwebui

Hi guys, has anyone used Vllm as endpoint in OpenWebUI? I have created a serverless pod but it does not let me connect from openwebui (loaded locally). Does anyone know if I have to configure the external port and how it would be?

Has anyone experienced issues with serverless /run callbacks since December?

We've noticed that response bodies are empty when using /run endpoints with callbacks in the RunPod serverless environment (occurring sometime after December 2nd). Additional context: - /runsync endpoints are working normally - Response JSON format appears correct in the "Requests" tab of RunPod console under Status...

You do not have permission to perform this action.

client = OpenAI( api_key = RUNPOD_TOKEN, base_url = OPENAI_BASE_URL, ) ...

Not getting 100s of req/sec serving for Llama 3 70B models with default vLLM serverless template

I'm deploying Llama-70B models without quantization using 2x80GB workers but after 10 parallel requests the execution and delay time increases to 10-50sec. I'm not sure if I'm doing something wrong with my setup. I pretty much use the default setup with the vLLM template just setting MAX_MODEL_LEN to 4096 and ENFORCE_EAGER to true

CPU Availability in North America?

I spent all day trying to create a new CPU serverless endpoint. It kept getting stuck on "Initializing" for many minutes at a time. After spending a few hours digging through my Docker pipeline, I realized that the actual reason no workers were available is because I was attempting to stand up the servers in North America. When I picked the entire world, I saw that I could only get CPU servers in Romania and Iceland. Specifically EU-RO-1 and EUR-IS-1. That's understandable, I guess, but the Serverless » New Endpoint UI shows "High" availability of CPU3 and CPU5 workers across the board, even when narrowing it down to a single datacenter in the US. I learned to rely on that label when picking GPU workers for a different endpoint. Can you please confirm if my intuition is correct? And if so, perhaps you could improve the labeling in the UI to reflect the true availability of those workers?...

Serverless run time (CPU 100%)

So, i have a comfy UI workflow with a couple of custom nodes running. Most of the time my workflow takes about 6-8 minutes. The weird thing, 24GB or 80GB is only 1-2 minutes difference. ...
No description

Custom vLLM OpenAI compatible API

Hello, I'm running OpenAI compatble server using vLLM. In runpod for SERVERLESS service you cannot choose the endpoint you want to track the POST requests to, it's /run or /runsync by default, ny question is how do I either change the runpod configuration of this endpoint to /v1 (OpenAI endpoint) or how do I run the vLLM docker image so that it is compatible with the runpod?...

How to cache model download from HuggingFace - Tips?

Usin Serverless (48gb pro) w Flashboot. Want to optimize for fast cold start is there a guide somewhere? it does not seem to be caching the download - it's always re-downloading the model entirely (and slowly)...
No description

ComfyUI stops working when using always active workers

Hi. I know it's strange, but here it is. I have a workflow that works flawlessly when using serverless workers that are NOT always active. That is, if I set "always active" to 0 and max workers to 1 or 2 and it all works fine. For deployment, I put 1 worker as always active and 3 max workers. With this setup, (and exactly the same code as before), things stop working. The ComfyUI server starts but it looks like the endpoint never receives a request. If I set It back to set 0 always active workers, it works again. ...

is it possible to send request to a specific workerId in a serverless endpoint?

I need to have a custom logic to distribute requests to available workers in the serverless endpoint. Is there a way to send request to a specific worker using workerId?

Error response from daemon: --storage-opt is supported only for overlay over xfs with 'pquota' mount

Here are the request ids: e5307e07-7f0e-4b82-b668-7560a9b7ad4b-u1 9a65646e-1b26-4177-8262-59080c9d8e24-u1...