Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡｜serverless

⛅｜pods

🔧｜api-opensource

📡｜instant-clusters

🗂｜hub

developer_Matty

10/2/2025

Serverless DockerFile cache

Hi,every one,i need help My RunPod serverless endpoint frequently rebuilds Docker images, causing the cache to invalidate and forcing a re-download of over 100GB of data. However, sometimes the cache works fine. This inconsistency is driving me crazy. How can I ensure Docker caching works reliably?...

antonio

10/2/2025

1h serverless waiting build

how to start the build of my serverless project?

emilwallner

10/2/2025

error creating container: nvidia-smi: parsing output of line 0: failed to parse (pcie.link.gen.max)

I've been testing the cold start, and it works 3-4 times, but then I get the above error. I'm using the serverless loadbalancer endpoints.

Ethan

10/2/2025

Insane delay times as of late

Hi, I've been experience really long delay times the last few days (2+ minutes) I can't really afford for this to happen and I believe I've noticed this before after the pods were active for some time. I'm not sure if this is because of some leakage or something....

WeamonZ

10/1/2025

Severe performance disparity on RunPod serverless (5090 GPUs)

I’ve deployed workflows on RunPod serverless with 5090 GPUs, and the performance differences I’m seeing are concerning. Same endpoint, same model, same operation — yet the results vary a lot: Sometimes the workflow finishes in around 44 seconds...

PatrickCmd102

10/1/2025

ReadOnly Filesystem

Hi Runpod, Are network storage volumes only in READ ONLY mode when mounted on serverless endpoints when running. I get this error when the cached model on the network storage is trying to get updated with changes from huggingface. See attached log...

error_log.txt

emilwallner

10/1/2025

Incorrect configuration in worker-load-balancing example

In the documentation example here: https://github.com/runpod-workers/worker-load-balancing/tree/main It says to set these: PORT = 5000 PORT_HEALTH = 5001...

Solidsoldier

10/1/2025

how to load multiple models using model-store

Title, Because as I can see we can only cache one model for now from hugging face.

WeamonZ

10/1/2025

Stucked in queue, but workers available

Hi, I have a my request stucked in queue even tho I have 4 workers available. What's going on ?...

emilwallner

10/1/2025

How to configure auto scaling for load balancing endpoints?

From the documentation: "The method used to scale up workers on the created Serverless endpoint. If QUEUE_DELAY, workers are scaled based on a periodic check to see if any requests have been in queue for too long. If REQUEST_COUNT, the desired number of workers is periodically calculated based on the number of requests in the endpoint's queue. Use QUEUE_DELAY if you need to ensure requests take no longer than a maximum latency, and use REQUEST_COUNT if you need to scale based on the number of requests." From what I understand the load balancing endpoints don't have a queue? How do I configure the auto scaling to work with serverless endpoints?...

بطرفلاي

10/1/2025

Unable to connect to a serverless load balancing workers

I'm running a serverless load balancing endpoint for my Fast API server, although when I send a request to the endpoint I get 400 response after over two minutes. Moreover, HTTPS serivces marked unready and web terminal is not starting. I have set PORT in env variables with the same value my server running on. I cannot see errors anywhere. How can I fix that?...

Jan

10/1/2025

Builds pending for hours, then failing with no logs

I've had the situation several times that builds are pending for hours and then stall with Build Failed and No logs yet.... Re-reunning the build would then often succeed without any change to it. Is there any way to circumvent this?

atu

10/1/2025

Network volume selection has disappeared from serverless endpoint creation process.

^the title. There's only this new "Model" now, which seems to be super cool when it'll be out of beta. Also can i recommend something? If there's a way to see which huggingface models are "cached", that'll be so cool too!

Solution:

Thanks for the bug report and feature request! You can attach a network volume after the serverless endpoint is created, and I've passed the feature request forward.

crown

9/30/2025

Pre-cached model selection doesn't appear to existing when creating a new serverless endpoint

The docs (https://docs.runpod.io/serverless/endpoints/manage-endpoints) say: """ Model (optional): Select a model from Hugging Face to optimize worker startup times. When you specify a model, Runpod attempts to place your workers on host machines that already have the model cached locally, resulting in faster cold starts and cost savings (since you won’t be charged while the model is downloading). You can either select from the dropdown list of pre-cached models or enter a custom Hugging Face model URL. """...

Five

9/30/2025

What is this?

error starting container: Error response from daemon: failed to create task for container: failed to create shim task: unable to write to a control group file /sys/fs/cgroup/docker/28efdc7a49e4ef7997c32dd12467dd5fbc8d6763db2da6357c5b87e10e513ff9/memory.oom_control, value [CREATE FILE] caused by: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }

Illu7ionist

9/30/2025

AI Toolkit with Serverless

Is there a way to get the AI toolkit image that's available as a Pod template for serverless? I'm looking for a way to train WAN LoRAs with an endpoint, any help in this would be super appreciated.

9/29/2025

serverless down ?

There is an error saying no gpu avaialble, yet our worker is running and being charged.. What is going on ?

Deany

9/29/2025

Please resolve this really urgent issue.

I'm unable to connect my pod with this issue: "This server has recently suffered a network outage and may have spotty network connectivity. We aim to restore connectivity soon, but you may have connection issues until it is resolved. You will not be charged during any network downtime." My server was running on and it mustn't be shopped. Could you resolve this issue asap. My pod ID is "vjwinhaduxgt3w"...

Jan

9/29/2025

No workers available in EU-SE-1 (AMPERE_48)

I deployed endpoint s7gvo0eievlib3 hours ago with storage attached. Build was fine and release was created. But I don't have any workers assigned. The GPU is set to AMPERE_48 of which it said High Supply. What am I doing wrong and how do I fix this?

dagger

9/29/2025

Can't load load model from network volume.

I'm trying to load model from network volume with my serverless worker with environment MODEL_NAME, but even when setting up the template I got this error:

Failed to save template: Unable to access model '/workspace/weights/finexts'. Please ensure the model exists and you have permission to access it. For private models, make sure the HuggingFace token is properly configured.

Failed to save template: Unable to access model '/workspace/weights/finexts'. Please ensure the model exists and you have permission to access it. For private models, make sure the HuggingFace token is properly configured.

...

Gaming

Programming

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Runpod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!