Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

I shouldn't be paying for this

It says it's running, but in reality it's still initializing, even system logs are empty. I should not be charged for such dead containers!
No description

Offloading multiple models

Hi guys, anyone has experience with a inference pipeline that uses multiple models? Wondering how best to manage loading of models that exceed a worker's vram if everything is on vram. Any best practices / examples on how to keep model load time as minimal as possible. Thanks!...

Increase Max Workers

Hey there, I'm currently setting up the runpod Team/Account for our agency. We're planning on testing RunPod for a serverless SD deployment within our agency. For this we need to increase the max amount of workers I can assign to a serverless endpoint. Could someone from the RunPod team reach out to me via DM if possible? We're still in the middle of setting everything up, including the automatic payment system, so we're still stuck with the default limit....

generativelabs/runpod-worker-a1111 broken

error pulling image: Error response from daemon: pull access denied for generativelabs/runpod-worker-a1111, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

ComfyUI Serverless with access to lots of models

Hi, I have a pre-sales question. I am currently hosting a Discord bot and website for image generation using ComfyUI API endpoints on a local PC. it has around 1TB of checkpoints and loras available to be used, but as the number of users are growing I'm considering a serverless gpu where I can pay just for compute time. With Runpod serverless, am I able to quickly deploy instances of Comfy, with any checkpoints/loras that the user wants for their generation? I was thinking of having the most popular models stored on runpod storage for fastest deployment and ones that are rarely used are downloaded on demand and swapped out to make room when needed. Am I able to do this, or something similar?...
Solution:
By using network storage and serverless
Message Not Public
Sign In & Join Server To View

Stuck on "loading container image from cache"

Hi, I have updated my serverless endpint release version but some of my workers are stuck on "loading container image from cache" even though its a new version that shouldn't exists in the cache to begin with. Any advice on how to solve this issue?...

Get Comfyui progress with runpod-worker-comfyui?

Hello there, I just deployed the runpod worker comfyui, I'm wondering if there's a way for me to monitor the progress of the prompt. normally I do this with websockets, but I think it's not possible here?

Llama 3.1 + Serveless

I´m trying to use this tutorial: https://discord.com/channels/912829806415085598/1266059553838202990/1266059553838202990 Tried to use: pooyaharatian/runpod-ollama:0.0.8 and override the default start with llama3.1...

Long wait time for Serverless deployments

Hi perhaps someone can help. We've got a various workloads running on Runpod. We deploy to Runpod using SST. Updating the a template with the new image to deploy works great in our CI (github actions). Once we've deployed our updated code to Runpod we want validate that our application is working so we run some tests by invoking the endpoint and asserting on various outputs produced. We do this with preview environments (per pull request) and in a staging environment on the way out to production. We realised a while ago that our tests would often be running against an older version of the code since Runpod handn't had a chance to pull the newer image from Dockerhub. Our solution to this was to add a Job in the CI that would repeatedly call the /runsync endpoint until the SHA (baked into the image) matched the one the CI was currently running against and move on to the testing stage once we were certain we would be testing the latest version of the code. This mostly worked with an occasional timeout here and there. Our configuration was: ```...

Random CUDA Errors

Hello! About once every 2 weeks the following errors appear for a few hours and then it fixes itself: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect....

RUNPOD - rp_download

I have recently been seeing issues with from runpod.serverless.utils import rp_download downloaded_input = rp_download.file(url) ...

Response is always 16 tokens.

Hello. I'm new to Cloud and tried following the Docs for serverless running Google's Gemma 7b model. After the endpoint is successfully set up and I do a test request within the RunPod dashboard I noticed the response is always 16 tokens. I tested locally with Postman, and used a variety of prompts but always get a truncated 16 tokens back. I also tried with Llama 3.1 8b Instruct using the vLLM template, and made sure to set the max sequence length to something high (like 6k), but still only get 16 tokens back. I've also tried setting the max_tokens directly in the request. I'm not sure what I'm doing wrong....

How to deal with multiple models?

Anyone has a good deployment flow for deploying severless endpoints with multiple large models? Asking because building and pushing a docker image with the model weights takes forever.

FastAPI RunPod serverless request format

Hi everyone, Second post from me. I am (still) trying to deploy a FastAPI app in a Docker container hosted on a serverless RunPod. Here is a toy example: ```python...

Mounting network volume into serverless Docker container

Hi everyone, I am trying to deploy a FastAPI app in a Docker container hosted on a serverless RunPod. My issue is: I need to pass a very large data folder to the Docker container....

Google cloud storage can't connect

Hi, I'm having trouble connecting/transfering data from comfyui pod to google cloud storage. I get this message: Failed to copy: can't make bucket without project number
No description

data security and compliance certifications (SOC2 type 2, ISO, HIPAA, GDPR)

@Madiator2011 (Work) @JM You guys seem to keep moving the goal post on your data security and compliance certifications (SOC2 type 2, ISO, HIPAA, GDPR) https://discord.com/channels/912829806415085598/948767517332107274/1208587476365479946 On your site it claims now you'll have these in Q3 2024. A support member quoted me as August 2024. What's the reason for moving the goal post? Are you guys failing the complience certifications or what's up? From a company perspective it's concerning so would be great to have tangible evidence you're on track for August 2024 for production client data use....

Urgent: Issue with Runpod vllm Serverless Endpoint

We are encountering a critical issue with the runpod vllm serverless endpoint. Specifically, when attaching a network volume, the following code is failing: `response = client.completions.create( model="llama3-dumm/llm", prompt=["hello? How are you "],...

What is vars.RUNNER_24GB?

I'm trying to get the CI/CD for the serverless worker-template to work on Github. What should I put for the vars.RUNNER_24GB? And why does the handler test needs my PAT and my organization? Can I use my GITHUB_TOKEN automatic token instead? https://github.com/runpod-workers/worker-template...

v1 API definitions?

Is there any documentation for RunPod v1 endpoints? specifically looking for documentation for: https://hapi.runpod.net/v1/pod/{POD_ID}/logs This seems to be what RunPod uses to stream their logs from serverless workers to their website. I would like to implement similar functionality into my web app rather than streaming those logs over a web socket, with custom code as I do today. ...