Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

🔧｜api-opensource

📡｜instant-clusters

🗂｜hub

letajmal

3/1/2024

What is the recommended System Req for Building Worker Base Image

I was trying to build a custom runpod/worker-vllm:base-0.3.1-cuda${WORKER_CUDA_VERSION} image, but my 16vCPU, 64GB RAM server crashed. What is the recommended system spec for this purpose

lousy

3/1/2024

Is there documentation on how to architect runpod serverless?

Wondering if theres Do's / Dont's of integrating runpod serverless into a larger architecture. I assume its not as snappy as lambda so I'd need to plan more aggressively around warm / cold starts? Also is RunPod serverless ready for prod deployments, or is it more of a "use at your own risk" service?

smoke

3/1/2024

Docker image cache

Hi there, I am quite new to RunPod so I could be wrong but my Docker image is quite large and before my serverless endpoint actually runs, the endpoint is in the 'Initializing' state for quite long. Is there a way to cache this image across endpoints or does this already happen? This is the first request I am doing so it might already be cached for this endpoint but not quite sure. I'd appreciate it! I am not using the network volume/storage so maybe that's also why....

iNF3Rnus

3/1/2024

What port do requests get sent on?

I want to do something a little custom, I don't want to use the serverless package, I want to use my own code, i.e. a flask app running on gunicorn for my container... I need to have a flexible container that's decoupled from RunPod. Is this possible? (Presumably it is?) I'd imagine I'd need to specify the endpoints for /run, /runsync, in my flask app etc. right? And then for the port mapping between the host and the container, how is that handled? Do I define the env var RUNPOD_REALTIME_PORT in the template and then the host then uses that for the hosts port, which is then the internal port used by the gunicorn server? ...

octopus

2/29/2024

Serverless calculating capacity & ideal request count vs. queue delay values

How do you calculate whether serverless worker is reaching it's capacity and what values to set for request count? I see in one of my serverless workers in production which is running regular Oobabooga (not vLLM so no concurrency) reaching 110k requests per day yesterday without starting a new worker. According to my observation my context length is usually 1000 input tokens and 10-70 output tokens which usually take between 2-5secs per request. Even if we take 1sec execution time per request it should have been able to handle only 86400 requests per day. How is it able to handle more without increasing the worker count especially when it takes 2-5secs per request?...

leduyson2603

2/29/2024

Runpod worker automatic1111 just respond COMPLETED and not return anything

I'm using the worker from https://github.com/ashleykleynhans/runpod-worker-a1111/tree/main, latest version so it should fix the "error" dict problem. For some requests, it just returns the status Completed and runpod logs show something like in the image below. I have tried to create a Pod mount on that volume and run the local request with test_input.json, everything work normally. Can you @ashleyk help me with this?

Solution:

Hi @Merrell , i think the problem is regarding the size of the response? If i set batch size to smaller or set the image size to smaller, everything work fine

AdamOH

2/29/2024

Serverless GPU low capacity

I'm finding it almost impossible to use the serverless endpoints as there are no GPUs available, I have a network volume in Romania so therefore need GPUs in the same region. It spends ages throttling "throttled: Waiting for GPU to become available.", then when eventually one comes online it goes off again soon after even if 'Idle timeout' is set to an hour. Is this a common state of just unusually busy right now. Does RunPod have plans to increase capacity, considering its in such large demand....

lahiru Ramesh

2/29/2024

Runpod queue not processing

Hey, using Kandinsky 2.1 deployed serverless application. Then hit the run endpoint and it was queued. checked status by id still in_queue status. anyone can resolve this issue ?...

ribbit

2/29/2024

cudaGetDeviceCount() Error

When importing exllamav2 library I got this error which made the serverless worker stuck and keeps on spitting an error stack trace. The error is:

RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW

RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW

What's about this error? Is this about the library or is there something wrong with the worker hardware that I've chosen? and why doesn't the error stop the worker? It keeps on running for 5mins without I even realizing....

Concept

2/28/2024

VLLM Error

2024-02-28T21:49:45.485567449Z The above exception was the direct cause of the following exception: 2024-02-28T21:49:45.485572406Z 2024-02-28T21:49:45.485576486Z Traceback (most recent call last): 2024-02-28T21:49:45.485580679Z File "/handler.py", line 8, in <module> 2024-02-28T21:49:45.485636156Z vllm_engine = vLLMEngine()...

wuxmes

2/28/2024

Getting docker error

Random error, no changes in image and was working just 1 min ago.

Casper.

2/28/2024

worker-vllm build fails

I am getting the following error when building the new worker-vllm image with my model. ``` => ERROR [vllm-base 6/7] RUN --mount=type=secret,id=HF_TOKEN,required=false if [ -f /run/secrets/HF_TOKEN ]; then export HF_TOKEN=$(cat /run/secrets/HF_TOKEN); fi && if [ -n "Pate 10.5s ------...

papanton

2/27/2024

Serverless not returning error

The following code: ``` def handler(event): try: logger.info('validating input')...

Beaver

2/27/2024

Getting 404 error when making request to serverless endpoint

I'm using the python SDK, and pasting in the endpoint ID into the provided example code. Here is the full response: ClientResponseError: Status: 404, Message: Not Found, Headers: <CIMultiDictProxy('Date': 'Tue, 27 Feb 2024 17:15:53 GMT', 'Content-Type': 'text/plain', 'Content-Length': '18', 'Connection': 'keep-alive', 'CF-Cache-Status': 'DYNAMIC', 'Set-Cookie': '__cflb=02DiuEDmJ1gNRaog7Bucmr44gWmZj9b8U2YPJr23J6Q9a; SameSite=None; Secure; path=/; expires=Wed, 28-Feb-24 16:15:53 GMT; HttpOnly', 'Server': 'cloudflare', 'CF-RAY': '85c2128cc898429e-EWR')>...

Solution:

Is your API key correct?

jax

2/27/2024

out of memory error

out of memory error GPU

Jidovenok

2/27/2024

Out of memory errors on 48gb gpu which didn't happen before

Some requests fail due to OOM, but the endpoint uses 48gb and is definitely capable of processing these requests

ribbit

2/27/2024

Is it possible to run fully on sync?

All the async functions and webhooks are so much pain, can we just fully run on sync?

ZooE1

2/27/2024

How to keep worker memory after completing request?

Hi! I'm running serverless for model GAN. I want preload model in memory at the first request and reuse it on the next req without load model again (in case container/pod still remain). When I sent 2nd req, Idle had "clean up worker" and load model again. How could I prevent "clean up worker" and keep model in memory? (in case container was not removed)...

ssssteven

2/26/2024

Failed to get job. | Error Type: ClientConnectorError

Hey all, I'm starting to receive this kind of error: 2024-02-26T21:49:02.442274586Z connectionpool.py :872 2024-02-26 21:49:02,441 Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7fd718d52aa0>: Failed to resolve 'api.runpod.ai' ([Errno -3] Temporary failure in name resolution)")': /v2/d7n1ceeuq4swlp/ping/xkqvldjqlccihw?gpu=NVIDIA+A40&runpod_version=1.6.0 2024-02-26T21:49:12.459986454Z {"requestId": null, "message": "Failed to get job. | Error Type: ClientConnectorError | Error Message: Cannot connect to host api.runpod.ai:443 ssl:default [Temporary failure in name resolution]", "level": "ERROR"} It seems like the system is keep retrying to get the job for 40s and this time interval is included for the serverless billing time. what is going on? Thanks!...

octopus

2/26/2024

Help: Serverless Mixtral OutOfMemory Error

I can't get to run Mixtral-8x7B-Instruct to run on Serverless using vLLM Runpod Worker neither for model from Mistral nor any of the quantized models Settings I'm using: GPU: 48GB (also tried 80GB) Container Image: runpod/worker-vllm:0.3.0-cuda11.8.0...

Previous Next

Gaming

Programming

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!