RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Runpod keeps removing my container after deployment PLEASE HELP

Hell guys, I am new to runpod and I have been trying to run a small llm on runpod through huggingfac for the last couple days and I keep facing same issue, after deploying the container runpod will automatically remove it shortly after, has anyone faced this before and what is your recommendations to resolve it. Thank you...

524 Timeouts when waiting for new serverless messages

After my async python serverless handler finishes one request, I then start getting these on that box:
2024-09-26T22:11:55.344188433Z {"requestId": null, "message": "Failed to get job, status code: 524", "level": "ERROR"}
2024-09-26T22:11:55.344188433Z {"requestId": null, "message": "Failed to get job, status code: 524", "level": "ERROR"}
...

Receiving "CUDA error: no kernel image is available for execution on the device" error on Serverless

I recently tried to change the Endpoint's GPU and reverted back - I'm now facing an error message "CUDA error: no kernel image is available for execution on the device" in the logs. Looked up on the web and it seems to be CUDA version mismatch. Really not sure how to address this - can anyone help?...

Fixed number of Total Workers - Any work around?

Currently our team has a pool of ~150 workers on RunPod serverless. The GPUs are of type RTX A4000/A5000/A6000. We have a total of 10 different models deployed on the serverless endpoints that we use at the time of inference. Each model has a different amount of active/max workers depending on the load that they can get, where they are placed in our pipeline, and the nature of the model. My question is: What are the best practices around runpod serverless, should we deploy multiple models within the same image and do a routing within the handler? This would let me make more endpoints with the given amount of workers. But with this solution one of my models can completely block off the requests for my other models....

6x speed reduction with network storage in serverless

To reduce my docker image size I wanted to use the network storage to store the models, but the main issue I am running against now is that I went from 20sec per request to 120sec. When looking at the logs, it takes almost 100sec (vs a few sec) to load the model in GPU memory. Why is the network storage so slow ??? its a major drawback and means you and I have to handle 10s of Gb of Docker image for nothing....

API to check the template

is there way to check all my templates using API?

Sharing a pod template

Do serverless templates count towards the referral? I'm trying to sharing my serverless template but it seems the templates can only create pods.

Flux.1 Dev Inference

I would like to run inferences (txt2img) with Flux.1 dev in fp8 and fp16. Is there any 'Quick Deploy / Template' in planning?

Move a Pod to Serverless?

Hello everyone. I have a kind of high level / conceptual type question: The background: I have used comfyui on my physical computer. As many of you know, with comfyui if your settings/models arent jiving, then it will fail. I want to scale up comfyui to roll out to my small company so anyone can use it via a internal website I have created (via a API, so they wont 'see' comfyui). Basically scale and deploy specific stable diffusion workflows. I have attempted to use a few different serverless companies, with not much luck. Its slighty beyond my knowledge / skillset. What I like from trying the pre-built A1111 serverless sample file is that I can see the logs to troubleshoot issues which helps me understand what is going on more than the others companies i have tried. My question: Since I'm not skilled with Docker, Im wondering if its easier for me to start and configure a Pod to troubleshoot and get things setup and configured (not just the 'hardware' but also the actual config of comfyui). Then, can that pod be transitioned / exported and then reimported to a serverless setup? Is this process possible? or am i thinking about it the wrong way?...

AWS ECR Registry Authentication

Is it possible to use a private AWS ECR as a container registry as opposed to Docker Hub? Although RunPod allows username and password credentials to be configured (which would work with Docker Hub) there does not appear to be a feature to use AWS IAM credentials. The AWS CLI allows an IAM principal to obtain a password using aws ecr get-login-password however this is only valid for 12 hours so would need to be cycled regularly but there does not appear to be a programatic way of doing this in RunPod. This question was put to the ai-helper but it was unable to provide a resolution on this (https://discord.com/channels/912829806415085598/1118945694863065230/1288092289977290762). I know this question isn't Serverless specific but I couldn't find a better place to put this question and we're using RunPod Serverless. I also think it might apply to Serverless more so than Pods due to the potential nature of having to pull images more frequently (potentially every cold start?)....

Comfyui serverless API

My serverless Comfyui workflow API requests fails most of the time. What could be the issue here? Is the pod taking too much time to load Comfy? Are there any tips/tricks that I can do to make it pass consistently?...
No description

Bad pods on Serverless

I see that about 20-30% workers that are spawned fail with this error error starting: Error response from daemon: error starting: Error response from daemon: No such container ...

Error creating Qwen2-VL model does not support type "qwen2-vl"

When I try to run Qwen2-VL using the vLLM template, I get an error that an old version of Transformers library is installed. Is there a way to update it to use the latest transformers from git?

How Can I configure Serverless endpoint to always point to latest image?

I set my container image path to be like asia-northeast1-docker.pkg.dev/project-id/job-name/run-dev:latest for my serverless endpoint. the behavior of serverless endpoint seems very unpredictable to me as it sometimes auto re-pull image with latest tag(this is the desired behavior), it uses initial version of image, it does not do anything and keeps using older version. How can I stably make the endpoint to always use latest imaeg?...

Llama-70B 3.1 execution and queue delay time much larger than 3.0. Why?

I deployed these two model who seem to be using same techniques. I'm using same machine 2x80GB but the execution time and queue delay time has massive differences: Queue delay: Llama70B 3.0: 0.02 secs Llama70B 3.1: 0.15 secs ...

Issue with Multiple instances of ComfyUI running simultaneously on Serverless

Hello, I am using Runpod Serverless and deploying ComfyUI using this repo: https://github.com/blib-la/runpod-worker-comfy?tab=readme-ov-file#bring-your-own-models For the Server, this repo is being used: https://github.com/comfyanonymous/ComfyUI I am deploying via docker image and both these repos are engrained into the image. ...

Possible to provision network volume programmatically?

Hi, we were hoping to be able to use the Runpod API to programmatically create templates, endpoints, and network volumes to support those endpoints. We see options for creating templates and endpoints, but couldn't find anything on how to create new network volumes. Is this possible? If not, is it a feature that is on the roadmap?

Error with the pre-built serverless docker image

Hi completely random, because sometimes it works smoothly, using the runpod serverless VLLM the machine gets stuck on Using model weights format ['*.safetensors']...
No description

How to use environment variables

I have added environment variables in my runpod serverless endpoind, the thing is i cant reach then inside the pod, i have defined it like this in the UI: key. | value SOME_KEY | keyvaluehere ...

job timed out after 1 retries

I'm getting this message with a FAILED state, in roughly 10% of the jobs coming to this endpoint. Usually this comes with a 2-3 minute delay time as well. Where should I start looking to figure out what could be the issue here? ...
No description