Runpod•4w ago

Need more RAM but not more VRAM in serverless endpoints

What to do If I need endpoints of more RAM than the ones present in serverless endpoints, this is for GPU endpoints.

11 Replies

I have similar question.

can you share more? which gpu type and how much more ram?

He says he doesn't need more vram. He needs more RAM allocated to worker Here RAM refers to system RAM and not the GPU RAM

flash-singh•4w ago

yep updated, still same question

CodingNinja•4w ago

Let's say GPU Type is RTX A5000, and I need fixed system RAM of 80 GB. But currently the the allocation happens randomly. Is it possible to fix the RAM?

flash-singh•4w ago

currently its not possible, we have been contemplating for a while of supporting a feature to allow defining specific ram, that's why i was asking to understand if the ram requirements are within the realm we can offer typicall 1.5x-2x ram compared to vram is possible, but currently its random, something we can optimize as we handle workloads, anything more than 2x likely to run into capacity issues, we have to explore more

CodingNinja•4w ago

Nope, they aren't. Yesterday I created one endpoint and because of one worker having low RAM allocation, I got OOM error. Then I terminated it and luckily got higher allocation in another worker.

flash-singh•4w ago

yes that's what i meant, currently serverless doesnt have the feature to give you workers with certain ram, its something we need to enable with some additional cost attached to it

Josh-Runpod•4w ago

@CodingNinja - Curious can you give us a bit more insight into your workload that's consuming so much RAM? What types of tasks are you running?

CodingNinja•3w ago

Take a simple ComfyUI Workflow for WAN 2.2 Animate, you can see the entire system RAM has been exhausted and the Pod is unresponsive. Video resolution was only 720x720 and GPU RAM wasn't an issue in above case. System RAM plays a very significant role when it comes to ComfyUI. ComfyUI keeps a lot of stuff on CPU: model parts get loaded/serialized there before moving to GPU, VAE decode and image IO happen on CPU, plus ComfyUI caches node outputs in RAM. PyTorch also uses pinned host buffers for GPU transfers. All that stacks up and spikes host memory even when GPU looks good. That’s why a worker with more RAM ran fine, but a lower-RAM one died. Then since the Serverless is costly when compared to Pods, the clients would want to minimize the cost and they can't always go with Pro GPUs. So more offloading will happen in such cases. Since the system RAM allocation is random per worker, the workloads feel like luck — sometimes it fits, sometimes it OOMs.

SolidsoldierOP•3w ago

Yes this was my usecase and issues as well

Gaming

Programming

Need more RAM but not more VRAM in serverless endpoints

Did you find this page helpful?