RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Docker Image EXTREMELY Slow to load on endpoint but blazing locally

This is the first time I'm encountering this issue with the serverless EP I've got a docker image, which loads the model (flux schnell) very fast, and it runs a job fairly fast on my local machine with a 4090. When I use a 4090 in RP though, the image gets stuck at loading the model ```self.pipeline = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)...

Constantly getting "Failed to return job results."

``` {5 items "endpointId":"mbx86r5bhruapo" "workerId":"r23nc1mgj01m13" "level":"error"...

Why is my serverless endpoint requests waiting in queue when theres free workers?

This has been happening,when two people try to make a request at the same time, the second users request will wait in queue until the first request is completed instead of trying to use another worker. I have 4 workers avaliable on my endpoint so thats not the issue. I set the queue delay to 1 second because thats the lowest possible but it doesn't do anything. Is the serverless endpoint suppose to work in production?

Github integration

@haris Trying the new github integration. It says it gives "Read and write access to code" permissions. Why does the github integration require WRITE access to code?

Is VLLM Automatic Prefix Caching enabled by default?

Hello! I setup a Serverless quick deployment for text generation and I was wondering if VLLM Automatic Prefix Caching is enabled by default? Also see: https://docs.vllm.ai/en/latest/automatic_prefix_caching/apc.html ...

vllm worker OpenAI stream timeout

OpenAI client code from tutorial (https://docs.runpod.io/serverless/workers/vllm/openai-compatibility#streaming-responses-1) is not reproducible. I'm hosting 70B model, which usualy has ~2 mins delay for request. Using openai client with stream=True timeouts after ~1 min and returns nothing. Any solutions?...

VLLM model loading, TTFT unhappy path

I am looking for a way to reduce the latency for the unhappy path of VLLM endpoints. I use the quickstart VLLM template, backed by a network storage for model weights and flashboot enabled. By default the worker will load the model weights on first request. This, however poses the risk of exposing my customers to an unhappy path of latency measured in minutes, at scale we could see this in significant absolute numbers. What would be the best way for me to make sure that a worker is considered ready only >after< it has loaded the model checkpoints, and trigger checkpoint loading without sending the first request? Should I roll my own VLLM container image? Or is there an idiomatic way to parametrize the quickstart template to achieve this? I would prefer to use the Runpod supplied, properly supported VLLM image, if possible....
No description

can't pull image from dockerhub

2024-12-11T12:28:04Z 257642480b4e Extracting [==================================================>] 33.06GB/33.06GB 2024-12-11T12:28:04Z failed to pull image: failed to register layer: archive/tar: invalid tar header @Zeke...

serverless socket.io support

Hello, I want to use socket.io-based serverless endpoint using RunPod serverless. I'm curious if this is possible. When I create a serverless API and connect to that API address via socket.io, which internal instance exactly does it connect to? Because I want to connect to an instance where the queue_length is less than 30 in the serverless instance. However, looking at serverless, this seems impossible. Is this possible?...

Running llama 3.3 70b using vLLM and 160gb network volume

Hi, I want to check if 160 gb is enough for llama 70b and whether I can use use a smaller network volume

I don't know my serverless balance goes down

Hi, I recently made some changes to my platform. It analyzes videos using 3 different computer vision models. I have a serverless endpoint for each. I think that somewhere I am making some requests that I should not be making, or that some endpoints are active when they should not. For example, this happened from 14:20CET to 15:15CET today. I had $85.443 and when I came back I had $83.934. I was doing other tests on my app during this time, but I wasn't calling any endpoint here....

Structure of "job" JSON

I understand that at the very least there is job["id"] and job["input"] and we utilize it. It will help me a great deal if I could send additional information like job["source"] or other metadata to the handler function. It seems like no matter how I structure the JSON, only id and input end up in the job JSON to the handler. Is this indeed the case? ...

Automatic1111 UI with serverless stable diffusion

Hi guys, sorry if this is a noobish question. I want to create a web app that has the automatic1111 sd ui, and then call the serverless API, so I don't have to run a pod continuously. Has anyone done this before? I would really appreciate if someone could point me in the right direction of how to do this....

Serverless github endpoint stuck at uploading phase

First of all i'd like to thank the Runpod team for their amazing work ! 🎉 Although it worked on the initial deployement, I seem to have an issue with subsequent ones. The worker build and deploy correctly but the UI is stuck at the "uploading" phase and the active build doesn't update....
No description

Best Practice for SAAS

I'm new to this. If I wanted to create a SAAS application to offer custom chat for customers using their data. What would be the best practice structure on the Runpod end? Would I have a single enpoint that is shared? Would you setup dedicated pods? ...

Serverless Workers redis client?

Anyone seeing an error like this today? "error": "redis err: client is nil". I'm not using redis in my serverless env, although I am using ComfyUI but I don't think they use it either. Requests are getting hung for a few minutes, eating billing, then failing anyways....

Serverless request returns None from python client but web status says completed successfully

Hello, I have been baffled by this issue for weeks and im pulling my hair out. I have a serverless endpoint that always comes back as None from the python runpod client with no error messages in the logs or from my inference script. Yet the runpod.io metrics for my requests always show has completed. ...
No description

Template id missing in serverless dashboard

Just noticed the template id is completely missing from the endpoints. I have some templates with same docker image tag but different template id so that when I'm testing new images on the test endpoint it doesn't mess up the production endpoint. Can the template id pls be added back like next to docker image?
No description

Disk size when building a github repository as an image on Serverless

I have a question about the disk size when building a github repository as an image on Serverless. Does the option to set the disk size in the serverless settings affect the computer that builds it? For example, if the image being built is about 17GB in size and the computer needs 65GB of storage to build it, should I set the disk size to >17GB or >65GB?