TL;DR — How to manage GPU memory via concurrent request to ensure GPU memory is fully utilized?
Context
1. Suppose I have a RTX 6000 GPU with 96GB RAM
2. I also have a Stable Diffusion model which takes up 7GB of RAM.
3. I want to fully utilized the remaining memory of my GPU by allowing concurrent request via a smart memory management systems to avoid OOM.
4. My request is simple, it specifies the # of images to generate for a specific prompt.
Question: Are there solutions to solve this issue?