TL;DR — How to manage GPU memory via concurrent request to ensure GPU memory is fully utilized?
Context 1. Suppose I have a RTX 6000 GPU with 96GB RAM 2. I also have a Stable Diffusion model which takes up 7GB of RAM. 3. I want to fully utilized the remaining memory of my GPU by allowing concurrent request via a smart memory management systems to avoid OOM. 4. My request is simple, it specifies the # of images to generate for a specific prompt.
Question: Are there solutions to solve this issue?
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!