RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

RunPods Serverless - Testing Endpoint in Local with Docker and GPU

I’m creating a custom container to run FLUX and Lora on Runpods, using this Stable Diffusion example as a starting point. I successfully deployed my first pod on Runpods, and everything worked fine. However, my issue arises when I make code changes and want to test my endpoints locally before redeploying. Constantly deploying to Runpods for every small test is quite time-consuming. I found a guide for local testing in the Runpods documentation here. Unfortunately, it only provides a simple example that suggests running the handler function directly, like this:...

is runpod serverless experiencing issues?

Seeing alot of these errors today 2024-10-15T22:06:11.508889850Z connectionpool.py :870 2024-10-15 22:06:10,637 Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='api.runpod.ai', port=443): Read timed out. (read timeout=8)")': /v2/080ddk82a04i8f/ping/365s5mr8swww1x?gpu=NVIDIA+GeForce+RTX+4090&runpod_version=1.7.0...

How to go about applying for Runpod's creator program?

Hi, i'm KingTut, founder of Ainime, an AI-powered platform dedicated to creating high-quality, original anime. Our mission is to empower studios of all sizes to produce diverse and inclusive content, particularly focusing on underrepresented characters like Black characters. Runpod’s robust GPU infrastructure and scalable solutions are perfectly aligned with our goal to make anime production easier, more affordable, and faster. By leveraging your technology, we can enhance our platform’s capabilities and deliver exceptional results to our users. As our user base grows, we are facing financial challenges in maintaining the necessary infrastructure. Joining Runpod’s Creator Program would provide us with crucial access to your resources, allowing us to build a robust solution while promoting Runpod’s services to a dedicated community of anime creators....

Initializing...

Ive started a new serverless instance. Its been Initializing for the last few hours. How long before the server actually gets created?

Connection timeout to host

2024-10-14T14:17:51.194569601Z 0|app | {"requestId": "sync-dccee400-e082-4633-8c95-238d11a57c51-e1", "message": "Failed to return job results. | Connection timeout to host https://api.runpod.ai/v2/i03hdwuhbsfyo8/job-done/9lja6f6wu32dyr/sync-dccee400-e082-4633-8c95-238d11a57c51-e1?gpu=NVIDIA+A100+80GB+PCIe&isStream=false", "level": "ERROR"} I am facing this error...

No container logs, container stopped, worker unhealthy.

Hello everyone. We run custom images on runpod to serve our inference. We have been having a hard time getting Runpod to behave consistently. Our serverless workers go "unhealthy" with no indication nor logs whatsoever on why that happens. Some images can't be run on most GPUs, whilst running just fine on 3090s....

Streaming LLM output via a Google Cloud Function

Has anyone been able to figure this out? User inputs are going through a GCloud Function that can then call the runpod model's inference. This pipeline works, but I now want the output to be streamed through instead of waiting ages for the complete answer. I have unsuccessfully so far tried to implement it, and Google's docs have examples for streaming LLM outputs using their Vertex AI service, not this specific case I am dealing with.

Serverless and Azure.

So I'm new to Runpod and the docs are not very helpful. I'm creating a comfy UI workflow that I want to run on Runpod eventually. But I wanted to try Runpod first. So I started a preconfigured serverless option. ...

Testing Endpoint in Local with Docker and GPU

I’m working on creating a custom container to run FLUX and Lora on Runpods, using this Stable Diffusion example as a starting point. I successfully deployed my first pod on Runpods, and everything worked fine. However, my issue arises when I make code changes and want to test my endpoints locally before redeploying. Constantly deploying to Runpods for every small test is quite time-consuming. I found a guide for local testing in the Runpods documentation here (https://docs.runpod.io/serverless/workers/development/local-testing). Unfortunately, it only provides a simple example that suggests running the handler function directly, like this:...

Chat template error for mistral-7b

``` 2024-10-14T10:19:42.283509829Z --- Starting Serverless Worker | Version 1.7.0 --- 2024-10-14T10:19:42.283511520Z ERROR 10-14 10:19:42 serving_chat.py:155] Error in applying chat template from request: As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one. 2024-10-14T10:19:42.283814574Z /src/engine.py:183: RuntimeWarning: coroutine 'AsyncMultiModalItemTracker.all_mm_data' was never awaited 2024-10-14T10:19:42.283849707Z response_generator = await generator_function(request, raw_request=dummy_request)...

When are multiple H100s cores on a single node available ?

10 x 48Gb GPUs cannot host all the model weights. Is RunPod planning to upgrade their platform ?

H100 NVL

If I've understood the docs correctly, H100 NVL is not available on serverless. Are there any plans to bring it to serverless? The extra 14GB of VRAM over the other GPUs is pretty useful for 70(ish)B parameter LLMs.

RunPod Header Timing is Off

These responses from the stream endpoint are coming in every second, yet the header date is saying, at points, that each response is 10 seconds apart. When this happens, nothing is shown in the stream despite the fact that I yield something every second. Is there a way to force the timing or another way I can get around this?
No description

Jobs randomly dropping - {'error': 'request does not exist'}

RunPod worker errors: ``` 2024-10-12T18:25:21.522075786Z {"requestId": "51124010-27f8-4cfa-b737-a50e6d436623-u1", "message": "Started.", "level": "INFO"} 2024-10-12T18:25:22.723756821Z {"requestId": "51124010-27f8-4cfa-b737-a50e6d436623-u1", "message": "Finished.", "level": "INFO"} 2024-10-12T18:27:09.433322101Z {"requestId": null, "message": "Failed to get job, status code: 404", "level": "ERROR"}...

Huge sudden delay times in serverless

I'm using a webui forge serverless template for my endpoint, with a network volume attached and sometimes the results are very inconsistent. For example in the last two results you can see I use the same worker but one has a delay time of 3s and another 80.39s, the second request was submitted 4-5 seconds later after the first request, so there was no long time gap either. I know the forge/automatic1111 templates usually take time to load but all this time up until this monday/tuesday or so it only took about 10-20 second delay time, but now I'm having 80-90 second delays. Didn't make any change in my code either. Anyone know the reason for this?...
No description

Testing Async Handler Locally

Hi, I am trying to test the async handler locally. I am following the documentation very closely (attached is an example of something I tried). However, I cannot seem to get the /status/{job_id} endpoint to return anything while the job is in progress (I expected it to return IN_PROGRESS and perhaps the values that have thus far been yielded) -- that is, I recieve no response. I am testing locally by running the handler with the --rp_serve_api flag. Is there something I am doing wrong? Why can't I see the status of a job in progress? ...
No description

OpenAI Serverless Endpoint Docs

Hello. From what I could find in the support threads here, you should be able to make a standard openAI request not wrapped in the "input" param if you hit your endpoint at https://api.runpod.ai/v2/<ENDPOINT ID>/openai/... The handler should then receive two new params, "openai_route" and "openai_input," but it's been a couple of months since the threads, and I can't find any official docs about this or the ability to test this locally with the RunPod lib. Can someone please confirm that this works in custom images too? If so, what is the structure of the parameters received? Does "input" in handler(input) contain "openai_input" and "openai_route" params directly? Is there any way I can develop this locally?...

Will there be a charge for delay time?

What is the charging model in runpod's serverless? Do I only need to pay for execution time + idel timeout, or do I need to pay for delay time + execution time + idel timeout?...

Some serverless requests are Hanging forever

I'm not sure why but I "often" (often enough) have jobs that just ... hang there even if multiple gpus are available on my serverless endpoint. new jobs might come it and go through while the old job just "stalls" there. any idea why ?...