Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

"IN QUEUE" and nothing happeneds

Hello everyone, I'm currently running a TGI container (ghcr.io/huggingface/text-generation-inference:2.2.0) within a serverless environment, alongside my model from Hugging Face. Issue: Although the status indicates "connected," there seems to be no further activity. The logs display various INFO and WARNING messages but do not show any errors. This has left me puzzled as to the root cause of the problem....
No description

How can I cause models to download on initialization?

``` FROM runpod/pytorch:2.2.1-py3.10-cuda12.1.1-devel-ubuntu22.04 WORKDIR /content...

Optimizing Docker Image Loading Times on RunPod Serverless – Persistent Storage Options?

I'm working with a large Docker image on RunPod Serverless, containing several trained models. While I've already optimized the image size, the initial docker pull during job startup remains a bottleneck as it takes too long time to complete. Is there a way to leverage persistent storage on RunPod to cache my Docker image? Ideally, I'd like to avoid the docker pull step altogether and have the image instantly available for faster job execution. Thanks,...

Hello

We are looking to build a medical app that will process Australian medical data from a AI model and we will be using Serverless to do this, does anyone know the legislation around sending Australian medical data overseas and I know that serverless doesn't store data long term but will it still be ok or so we need to have our own GPU servers in Australia? Thanks

About resources and priority compare with Pod

May I ask if there is a difference in priority or the number of GPUs between Pod and serverless? For example, if all the 3090 GPUs are out in POD, will they also be out in serverless? Or is the difference that serverless is like a service used to manage, auto-scale, and save costs?...

Is Billing Date a day off?

Hi, I am new to RunPod, and I am trying to understand my usage and how much it is costing me. However, when I look my billing history, it don't seem to align with reality of my actual use. I think the billing statements are listed as one day behind for me. For example, my bill for August 6th is listed as August 5th. Anyone else have this issue?

Does webhook work when testing locally?

I am trying to test serverless worker locally and everything runs fine except it doesn't call the webhook I provided in the test_input.json file. Here is an example of the JSON I am sending is this correct for calling a webhook? { "input": { "sample": "testvalue"...

HF_TOKEN question

Hello everyone, i am trying to launch TGI, so I get the error of 'Cannot access gated repo for url https://huggingface.co/google/MYMODEL' But i have my token as a variable, do i need to add it somewhere else? token is valid, i have doublechecked...
No description

A100 80GB GPUs unavailable

Hello Team, We have multiple production endpoints that use A100 80GB serverless. Suddenly all the endpoints A100 and H100 are unavailable. Is there any maintenance work going on?

Are the 64 / 128 Core CPU workers gone for good?

I noticed when selecting CPU workers for serverless endpoints that we are no longer given the option of the 64 or 128 vCPUs anymore. I know the 64/128 vCPUs were having issues running jobs. I am wondering if they are going to come back or are they gone for good? Thanks! 🙂...

Head size 160 is not supported by PagedAttention

Hello, I hope everyone is doing great! I am stuck with this error: ValueError: Head size 160 is not supported by PagedAttention. Supported head sizes are: [64, 80, 96, 112, 128, 256] Does it mean I have to RETRAIN MY MODEL? Full logs are in attachment...

how to set a max output token

Hi, I deployed a finetuned llama 3 via vllm serverless on runpod. However, I'm getting limited output tokens everytime. Does anyone know if we can alter the max output tokens while sending the input prompt json?

Inquiry on Utilizing TensorFlow Serving with GPU in Serverless Configuration

Hello Runpod Community, I'm exploring options to utilize TensorFlow Serving with GPU support in a serverless configuration on Runpod. Specifically, I'm interested in whether it's feasible to make requests from a Runpod serverless job to a TensorFlow Serving instance running on the same container or environment. Could anyone clarify if this setup is supported? Additionally, are there alternative recommended approaches for deploying TensorFlow Serving with GPU on Runpod's serverless infrastructure?...

data transfer cost

I am testing an application where it needs to return an image as response my two option of doing it first: attach the image as a return payload after the run finish second: upload the image to s3 bucket then send the link as a return payload...

Getting bind on address error in serverless

I'm always getting the following error on my serverless container:
[Errno 99] error while attempting to bind on address ('::1', 8000, 0, 0): cannot assign requested address
[Errno 99] error while attempting to bind on address ('::1', 8000, 0, 0): cannot assign requested address
What am I doing wrong?...

CUDA driver initialization failed

RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu. (Serverless RTX4090) (FROM runpod/base:0.6.2-cuda11.8.0) ...

Inconsistent 400 Bad Response from sending /run and /runSync.

Good evening everyone. I am getting the following error message when requesting Runpod's serverless: response error StatusCode: 400, <html>...

New release is taking too long.

I am trying to deploy a new image but it is taking too long. It has been stuck in the initialization stage for the past one hour. Any suggestions?
No description

Error response from daemon: Container is not paused.

Hello Team, After deploying a new docker image on a serverless endpoint I am getting the below errors in my system log: 024-07-30T11:56:27Z error starting: Error response from daemon: Container 2a638b70551885c464f48892d2d0fc9eed7eb590fbda42b33841d7e84b23b307 is not paused Can someone please help me this?...

The official a1111 worker fails to build

Attempts to build the main branch of https://github.com/runpod-workers/worker-a1111 fail. civit.ai no longer appears to allow unauthenticated model downloads, returning a 401. This was quite easy to fix. More importantly, the dependency chain appears to have regressed. Currently building the given repository, as cloned, with docker build --platform linux/amd64 -t test . results in...