Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

🔧｜api-opensource

📡｜instant-clusters

🗂｜hub

José Urbina

1/15/2025

using compression encoding for serverless requests

Just wondering if the serverless endpoint is capable of receiving and processing compressed requests? (eg. zstd, gzip)

thedeeno

1/15/2025

Throttled ECR Download?

We have a serverless endpoint that uses an ECR registery to back the image. When initializing a new worker the download of a changed layer, (which is a 3 GB) can sometimes take >20 minutes to download. Is this download speed typical? Is there another pattern we should be using? It's surprising that a pull from ECR is such a large bottleneck on our cold-start time....

Gabriel

1/15/2025

Need some help to troubleshoot a configuration of a Serverless

I have created my account and subscribed to create a Serverless, I did set it up using the web interface. But it doesn't seem to work. I need some help ASAP.

Milad

1/14/2025

Do Webhook Request Responses have a retry mechanism?

If a Response webhook fails is there a retry mechanism in place for resending the webhook again? If yes, what does it look like, i.e how many retries and for how long?...

noobgamer

1/14/2025

Incorrect billing

the billing for last 4 weeks seems to be wrong, can someone help me understand. I am using only two serverless endpoints and no other services. Endpoint ids: ed0rivbjvv0x0u and pzfz3xhwa86raj

Rishabh Jalan

1/14/2025

Request getting stuck

Hey i am using runpod endpoint and all my request are stuck . its mission critical . I have raised a ticket , using network volume EU-SE-1

Mark

1/14/2025

Serverles endpoint status and runsync not returning data anymore in request body (request not found)

Hey Team, I have a custom serverless endpoint worker. It always works. The logs always show that everything went as planned and the requests are always marked as completed after the time I expect. However, on my API the requests error out and on the UI they show completed but have no output. When I inspect the status on Thunderclient, runpod says that the request does not exist. I would like to understand what is going on and how I can make my api more resilient to these issues. Attached are screenshots of the behavior:...

jackson hole

1/13/2025

I want to increase/decrease workers by code or script, can you help? (GraphQL)

I have a serverless setup already. Generally we keep 1 active worker in the actual time when we expect the traffic throughout the day, and at night when no one is using the application we make active workers 0 to avoid any charges. And then the next day, we make active workers 1 manually from runpod dashboard. We are willing to do that automatically. I know there is a GraphQL but I am not able to find relevant code to do that. Can anyone please help?...

chinesesoup

1/11/2025

Support for https://huggingface.co/deepseek-ai/DeepSeek-V3?

Would it be possible to get support for https://huggingface.co/deepseek-ai/DeepSeek-V3? as this is currently the best model for coding that is opensource

Raphael Fakhir

1/11/2025

Serverless Idle Timeout is not working

One of my serverless endpoints is not respecting the idle timeout setting. Instead of staying active for 300 seconds, it turn to idle after 5. I have redeployed the endpoint, it work for a while, today again without any changes the endpoint turns idle after 5 seconds even though its set to 300....

falk

1/11/2025

Flashboot meaning?

Is there any documentation on what it does under the hood? i am asking because of this: "FlashBoot reduces majority cold-starts down to 2s, even for LLMs. Make sure to test output quality before enabling." ...

cellular-automaton

1/10/2025

Distributed inference with Llama 3.2 3B on 8 GPUs with tensor parallelism + Disaggregated serving

Hi. I need help with setting up a vllm serverless pod with disaggregated serving and distributed inference for a llama 3.2 3B model. The setup would be a disaggregated setup, something like 1 worker with 8 total GPUs, where 4 GPUs for 1 prefill task and 4 GPUs for 1 decode task. Can experts help me set this up using vllm on runpod serverless ? I am going for this approach as I want super low latency, and I think sharding the model for prefill and decode separately with tensor parallelism will help me achieve this....

dgaff

1/10/2025

job timed out after 1 retries

Hello! Getting this on every job now on 31py4h4d9ytybu endpoint on serverless. My logs have zero messages or indication about where this is happening, from the outside it looks as if the are totally paused or non-responsive. This silently hung work for over an hour. I'm on runpod 1.7.4. This is currently having significant impacts on production work, without any clear remediation (see screenshots for no logs for many many minutes despite work happening constantly, and errors on every job). Wou...

Nuzair Nuwais

1/10/2025

Can't see Billing beyond July

Hi I'm trying to get my billing invoices but I dont see anything beyond six months, can someone help?...

xeith_

1/8/2025

Linking runpod-volume subfolder doesn't work

Hey, I've been trying to create a serverless runpod that has some network volume attached to it. I want to link specific folders from the network volume to the runpod. To do so, im running the following bash file. ```bash...

Jesse

1/8/2025

How do we use serverless to train flux Lora for face? i am currently replicate's ostris ai-toolkit t

is there any production-grade solution or script to the same thing?

kingcashflow

1/8/2025

ComfyUI Image quantity / batch size issue when sending request to serverless endpoint

I'm not able to generate multiple images from a prompt / request to the endpoint using a ComfyUI workflow. We have added a variable for the ""batch_size": " value in our workflow, but it only seems to generate one image regardless of the batch_size we give it. This is our Github repo for the runpod worker: https://github.com/sozanski1988/runpod-worker-comfyui/ ...

logs-runpod-worker-c...

jackson hole

1/8/2025

Some basic confusion about the `handlers`

Hi everyone! 👋 I'm currently using RunPod's serverless option to deploy an LLM. Here's my setup: - I've deployed the vLLM with a serverless endpoint (runpod.io/v2/<endpoint>/run)....

Muhammad Numan

1/8/2025

Next js app deploy on Runpod

Dear Runpod community, I need to deploy our Next.js app on Runpod, similar to how it works on Vercel. In our Next.js app, I handle the frontend and also create backend APIs for MongoDB interactions. Additionally, I need to run scheduled jobs. Which hosting provider would you recommend for this setup? also can we do that with Runpod?

3WaD

1/7/2025

Optimizing VLLM for serverless

Hello. I am trying to optimize the VLLM for the serverless endpoint. The default VLLM settings are blazing fast for cached workers (~1s) but unusable with cold start initialization (40-60 or more seconds). Forcing eager mode removes the CUDA graph capture and helps push the initialization cold starts down to ~20s with a price of a slower generation time. But other than that, I feel stuck about what could be improved since currently the longest tasks are creating the LLM engine and VLLM's Memory profiling stage. Each takes up to 6 seconds. I am attaching the complete log file with time comments from such a job. I am wondering if anyone found the settings sweet spot for the fastest cold starts and acceptable generation speed, or if there's a way to remove the initialization part for newly spawned workers. Although I already researched many things, from automatic caching on a network volume (which didn't work at all and when using bitsandbytes models there is no cache being saved) to snapshotting and trying to share the initiated state between the workers (which is probably not possible)....

Previous Next

Gaming

Programming

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!