RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

generativelabs/runpod-worker-a1111 broken

error pulling image: Error response from daemon: pull access denied for generativelabs/runpod-worker-a1111, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

ComfyUI Serverless with access to lots of models

Hi, I have a pre-sales question. I am currently hosting a Discord bot and website for image generation using ComfyUI API endpoints on a local PC. it has around 1TB of checkpoints and loras available to be used, but as the number of users are growing I'm considering a serverless gpu where I can pay just for compute time. With Runpod serverless, am I able to quickly deploy instances of Comfy, with any checkpoints/loras that the user wants for their generation? I was thinking of having the most popular models stored on runpod storage for fastest deployment and ones that are rarely used are downloaded on demand and swapped out to make room when needed. Am I able to do this, or something similar?...
Solution:
By using network storage and serverless

Stuck on "loading container image from cache"

Hi, I have updated my serverless endpint release version but some of my workers are stuck on "loading container image from cache" even though its a new version that shouldn't exists in the cache to begin with. Any advice on how to solve this issue?...

Get Comfyui progress with runpod-worker-comfyui?

Hello there, I just deployed the runpod worker comfyui, I'm wondering if there's a way for me to monitor the progress of the prompt. normally I do this with websockets, but I think it's not possible here?

Llama 3.1 + Serveless

I´m trying to use this tutorial: https://discord.com/channels/912829806415085598/1266059553838202990/1266059553838202990 Tried to use: pooyaharatian/runpod-ollama:0.0.8 and override the default start with llama3.1...

Long wait time for Serverless deployments

Hi perhaps someone can help. We've got a various workloads running on Runpod. We deploy to Runpod using SST. Updating the a template with the new image to deploy works great in our CI (github actions). Once we've deployed our updated code to Runpod we want validate that our application is working so we run some tests by invoking the endpoint and asserting on various outputs produced. We do this with preview environments (per pull request) and in a staging environment on the way out to production. We realised a while ago that our tests would often be running against an older version of the code since Runpod handn't had a chance to pull the newer image from Dockerhub. Our solution to this was to add a Job in the CI that would repeatedly call the /runsync endpoint until the SHA (baked into the image) matched the one the CI was currently running against and move on to the testing stage once we were certain we would be testing the latest version of the code. This mostly worked with an occasional timeout here and there. Our configuration was: ```...

Random CUDA Errors

Hello! About once every 2 weeks the following errors appear for a few hours and then it fixes itself: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect....

RUNPOD - rp_download

I have recently been seeing issues with from runpod.serverless.utils import rp_download downloaded_input = rp_download.file(url) ...

Response is always 16 tokens.

Hello. I'm new to Cloud and tried following the Docs for serverless running Google's Gemma 7b model. After the endpoint is successfully set up and I do a test request within the RunPod dashboard I noticed the response is always 16 tokens. I tested locally with Postman, and used a variety of prompts but always get a truncated 16 tokens back. I also tried with Llama 3.1 8b Instruct using the vLLM template, and made sure to set the max sequence length to something high (like 6k), but still only get 16 tokens back. I've also tried setting the max_tokens directly in the request. I'm not sure what I'm doing wrong....

How to deal with multiple models?

Anyone has a good deployment flow for deploying severless endpoints with multiple large models? Asking because building and pushing a docker image with the model weights takes forever.

FastAPI RunPod serverless request format

Hi everyone, Second post from me. I am (still) trying to deploy a FastAPI app in a Docker container hosted on a serverless RunPod. Here is a toy example: ```python...

Mounting network volume into serverless Docker container

Hi everyone, I am trying to deploy a FastAPI app in a Docker container hosted on a serverless RunPod. My issue is: I need to pass a very large data folder to the Docker container....

Google cloud storage can't connect

Hi, I'm having trouble connecting/transfering data from comfyui pod to google cloud storage. I get this message: Failed to copy: can't make bucket without project number
No description

data security and compliance certifications (SOC2 type 2, ISO, HIPAA, GDPR)

@Madiator2011 (Work) @JM You guys seem to keep moving the goal post on your data security and compliance certifications (SOC2 type 2, ISO, HIPAA, GDPR) https://discord.com/channels/912829806415085598/948767517332107274/1208587476365479946 On your site it claims now you'll have these in Q3 2024. A support member quoted me as August 2024. What's the reason for moving the goal post? Are you guys failing the complience certifications or what's up? From a company perspective it's concerning so would be great to have tangible evidence you're on track for August 2024 for production client data use....

Urgent: Issue with Runpod vllm Serverless Endpoint

We are encountering a critical issue with the runpod vllm serverless endpoint. Specifically, when attaching a network volume, the following code is failing: `response = client.completions.create( model="llama3-dumm/llm", prompt=["hello? How are you "],...

What is vars.RUNNER_24GB?

I'm trying to get the CI/CD for the serverless worker-template to work on Github. What should I put for the vars.RUNNER_24GB? And why does the handler test needs my PAT and my organization? Can I use my GITHUB_TOKEN automatic token instead? https://github.com/runpod-workers/worker-template...

v1 API definitions?

Is there any documentation for RunPod v1 endpoints? specifically looking for documentation for: https://hapi.runpod.net/v1/pod/{POD_ID}/logs This seems to be what RunPod uses to stream their logs from serverless workers to their website. I would like to implement similar functionality into my web app rather than streaming those logs over a web socket, with custom code as I do today. ...

Error Handling for Synchronous + webhook & Asynchronous Endpoint

1. Synchronous, using webhook For example, when input validation fails, is that error sent via webhook, or is it immediately responded to as output? 2. Asynchronous If an error occurs before performing a long-running task, can we send that error as an immediate response? It seems like there's something like return_aggregate_stream, but I'm not sure how to receive that response. Should we do a GET /run? Or does the yielded result come directly as a response to POST /run?...

Exposing HTTP services in Endpoints through GraphQL

Hello! I was wondering how we could expose the opened HTTP service URLs in the Serverless Endpoints as they keep changing each time an Endpoint is set to zero workers. Thanks!

Monitor GPU VRAM - Which GPU to check?

I am trying to monitor the GPU VRAM usage in serverless worker. To do this with pynvml I need to provide the index of the GPU. Is there a way I can obtain the index of the GPU my worker is using? I did not see this info in the ENV variables. I do see RUNPOD_GPU_COUNT but not sure if that helps. Seems that RunPod is monitoring cpu, gpu stats as they present that information in their web interface. Does the RunPod python module expose those stats, without having to code our own? Below is a code snippet that reports VRAM usage in a %....