Runpod

R

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

serverless down ?

There is an error saying no gpu avaialble, yet our worker is running and being charged.. What is going on ?
No description

Please resolve this really urgent issue.

I'm unable to connect my pod with this issue: "This server has recently suffered a network outage and may have spotty network connectivity. We aim to restore connectivity soon, but you may have connection issues until it is resolved. You will not be charged during any network downtime." My server was running on and it mustn't be shopped. Could you resolve this issue asap. My pod ID is "vjwinhaduxgt3w"...

No workers available in EU-SE-1 (AMPERE_48)

I deployed endpoint s7gvo0eievlib3 hours ago with storage attached. Build was fine and release was created. But I don't have any workers assigned. The GPU is set to AMPERE_48 of which it said High Supply. What am I doing wrong and how do I fix this?

Can't load load model from network volume.

I'm trying to load model from network volume with my serverless worker with environment MODEL_NAME, but even when setting up the template I got this error:
Failed to save template: Unable to access model '/workspace/weights/finexts'. Please ensure the model exists and you have permission to access it. For private models, make sure the HuggingFace token is properly configured.
Failed to save template: Unable to access model '/workspace/weights/finexts'. Please ensure the model exists and you have permission to access it. For private models, make sure the HuggingFace token is properly configured.
...

Need more RAM but not more VRAM in serverless endpoints

What to do If I need endpoints of more RAM than the ones present in serverless endpoints, this is for GPU endpoints.

Are the Serverless Endpoints run on the "Secure Cloud" with 3-4 tier data centers?

I can't find any documentation for who owns and operates the GPU/CPUs for the serverless deployments.

Questions on preventing model reloads in Serverless inference

Hello, I am experimenting with implementing a serverless image generation API by serving my own model through Docker. During testing, I observed the following behavior:...

vLLM serverless not working with hugginface model

Hey, been trynna create a serverless instance of the model https://huggingface.co/fancyfeast/llama-joycaption-beta-one-hf-llava for image description in order to use its api on an automation. I have tried the vLLM template as well as using a git repo but whener I test a request I get the worker exited with exit code 1 I believe this is probably something very simple that i'm doing wrong. Thanks...

ComfyUI workers don't find my diffusion model

I can't get ComfyUI to find wan2.2_ti2v_5B_fp16.safetensors. Other diffusion models, for example for stable diffusion work fine. I have the model in a folder named "diffusion_models" in a network volume named models, but I always get this error:
* UNETLoader 37:
- Value not in list: unet_name: 'wan2.2_ti2v_5B_fp16.safetensors' not in []
Output will be ignored
* UNETLoader 37:
- Value not in list: unet_name: 'wan2.2_ti2v_5B_fp16.safetensors' not in []
Output will be ignored
...

Download huggingface models that require hf-token during build time?

Hello, can someone please let me know how can I download huggingface models that require hf-token during build time?

ReadTimeoutErrors in US-IL-1

Workers were running fine earlier today, but are now taking forever to start and half way through their execution (at no particular time or part of the code) they hang with the following: 'ReadTimeoutError("HTTPSConnectionPool(host='api.runpod.ai', port=443): Read timed out. (read timeout=8)")': /v2/6inskfdqe9z510/ping/ll0c30myrjn82o?gpu=NVIDIA+GeForce+RTX+4090&job_id=19f37b9a-f53e-4fea-b281-f7204b16df13-e2&runpod_version=1.7.13 For this particular run, the job ID is 19f37b9a-f53e-4fea-b281-f7204b16df13 and the worker ID is ll0c30myrjn82o (4090 in US-IL-1) in case someone can investigate....

Does the documented rate limit also apply to Load balancing endpoints (e.g. FastAPI)?

Are all endpoints capped at 2000 requests every 10 seconds, can this be modified?
Solution:
@emilwallner those rate limits do not apply to load balancer serverless since those paths are specific to queue serverless only, load balancer doesn't have rate limits per path, so far we haven't enforced any rate limit for it
No description

Can't run gpt-oss:120b on ollama

I can't deploy gpt-oss:120b model on ollama, because serverless doesn't allow me to change Container Disk size, it fixed at 20GB. Any change to this value (or enviroment values) will be revert to default value. Do you guy know how to fix this thing?

S3 access for EU-RO-1 has been down for days but no update from the team?

As you can see from the various posts (including my own) reporting the same problem, S3 access for network volumes on the EU-RO-1 region has been down for the past couple of days. I can fully understand that things like that can happen from time to time. However, the lack of status update or even acknowledgement of the problem from the company made me concerned. I recently switched the rendering backend of my service to RunPod and I've been testing it before it goes production. However, the lack of feedback on such a critical issue made me seriously wonder if I made a bad move, and should keep the AWS-based solution that I was using instead....

I have 1 query/worker but my workers are stucks today

workers are running but no results/very long wait time Is it on runpod side?...
No description

Unable to upload large files into network volume using aws s3 command

I have been trying to upload a model directory that consists of small files and 1 big model file. ❌ I used the aws sync command to upload the whole directory, but it always fail when uploading the big model file. ❌ I also tried to upload the single model file using the aws cp command, but same error response. ❌ I have also tried using the upload_large_file.py script from the github page. But same error....
No description

Adding Hugging face access token to vllm serverless endpoint

Hi. How to add Hugging face acces token to runpod vllm serverless endpoint? I tried to add it through setting env variables where it's written max 50 and I typed it there HUGGING_FACE_HUB_TOKEN= the token. But whenever run the request it shows state: in queue, the worker side: unhealthy, and the logs:File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/config.py", line 355, in get_config configdict, = PretrainedConfig.get_config_dict( File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 649, in get_config_dict...

Accessing Network Volume via S3 Returns "error"

Hi. I've been running a serverless service for months without a problem, which connects to a network storage using the S3 interface. However, I started getting error responses from the S3 client since a few hours ago, which reads:...

Running 30 a100 workers

Can I run 30 a100 workers for an endpoint. We have a business process which needs low processing time requirements. I want to test how much would it cost to hanle 30 request in this platform daily basis to see if it is going to be feasable for us. How can I increase worker number it is allowing me to increase to 5 right now.

Deploying blender on serverless doesn't utilize GPU

I am using Dockerfile to create a serverless instance with blender. Problem is the telemetry shows my GPU utilization is 0. Testing same code in local it utilizes the GPU. How i can be sure that serverless is not utilizing GPU is the time taken on 5090 (serverless) > 4070 (local machine) and how logs gets stuck where my GPU utilization spikes in local machine....
Solution:
I have figured out the issue and it had to do more with the blender version. Apparently 4.2.1 does not sit well with using GPU and CPU compared to other later versions. Switching to 4.3.2 made significant improvements in the time. According to web Blender 4.3.2 is the successor to 4.2 LTS, bringing significant usability improvements, a rewritten Grease Pencil engine for better performance, and a more flexible windowing system to Blender run pod 1st image...
No description