dbtr
RRunPod
•Created by dbtr on 4/6/2025 in #⚡|serverless
Serverless endpoint fails with Out Of Memory despite no changes
For several months I am using the same endpoint code to generate Stable Diffusion 1.5 images in 512x512 with Auto1111 (in other words, quite low specs). I have a serverless endpoint with 16GB (the logs show more memory available, but the setup was 16 GB).
There are very few requests to the endpoint. That's why I know that the worker was just booting up with a fresh start in my two test cases that failed
Practically right after booting and when I try to begin inference, I get the following error:
A1111 Response: {'error': 'OutOfMemoryError', 'detail': '', 'body': '', 'errors': 'CUDA out of memory. Tried to allocate 146.00 MiB. GPU 0 has a total capacty of 19.70 GiB of which 10.38 MiB is free. Process 1790219 has 19.49 GiB memory in use. Process 3035077 has 194.00 MiB memory in use. Of the allocated memory 244.00 KiB is allocated by PyTorch, and 1.76 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF'}
It says that a certain process is using around 20GB in memory. This hasn't failed before and I assume it is unlikely that my specific Stable Diffusion operation uses this much memory.
Can anyone help me where to start digging? Is it (at least theoretically) possible that some other process running on the same machine, but not from me, is using some shared memory here?
Thanks!
26 replies