Pod 100% Mem usage freeze
Building wheels for collected packages: flash-attn
Building wheel for flash-attn (setup.py) ... \
Gets stuck on the above when trying to run:
pip install flash-attn==2.8.0.post2 --no-build-isolation
Memory usage and CPU at 100% and following in system log:
WARN: container is unhealthy: triggered memory limits (OOM)
WARN: very high memory utilization: 57.74GiB / 57.74GiB (100 %)
20 Replies
That's to be expected, building flash attention takes a lot and you should prefer a prebuilt wheel.
https://github.com/Dao-AILab/flash-attention/releases/tag/v2.8.3
Or change MAX_JOBS as directed by Flash Attention https://github.com/Dao-AILab/flash-attention#installation-and-features
GitHub
GitHub - Dao-AILab/flash-attention: Fast and memory-efficient exact...
Fast and memory-efficient exact attention. Contribute to Dao-AILab/flash-attention development by creating an account on GitHub.
Trying to install ToonComposer, on their github they say it requires a specific version, is it possible to use also with prebuilt wheel?
Yes, I just linked you the very latest Flash Attention - sorry
I meant to give you the one for the version you selected
So just git pull instead then or
There are a lot of options and the names can look kind of scary but they're easily broken into readable parts
You can download the wheel and pip install the wheel itself
Slightly unsure on how to do that, but I can figure it out I am sure
curl -JLO <link>
You can just right click copy link for the right wheel file :pRight, so then installing the wheel is pip install with what arguments?
None, just
pip install flash_attn-2.8.0.post2+cu12torch2.5cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
(random file name, please make sure you grab the one for your setup)Damn, how am I supposed to know which one of these is the right one 😄
lol like I said it's a lot of data, just looking at the first one in the list:
flash_attn-2.8.0.post2+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
We can break it up into parts
Everything before the +
is the version, so we can skip up to that point.
cu12torch2.X
represents the CUDA and PyTorch version combination.
You can get your pytorch version with python -c "import torch;print(torch.version.__version__)"
On my computer this returns 2.7.1
, which is just 2.7
here and that limits the options to just a few.
cxx11abi
then TRUE
or FALSE
is determined by the response of python -c "import torch;print(torch._C._GLIBCXX_USE_CXX11_ABI)"
For me, this returns False
The last bit cp-3xx
depends on your python version, python -V
For me this returns 3.12.8 which is just 312
, this is repeated twice and all the wheels end in linux_x86_64.whl
So putting it all together, I would need:
flash_attn-2.8.0.post2+cu12torch2.7cxx11abiFALSE-cp312-cp312
Thanks a lot for the help. So if the runpod instance has torch 2.8, i need to downgrade I assume.
As this version of flash attn doesnt exist for torch 2.8
You could probably update flash attention depending on what you're doing, but downgrading would be easier if you don't have any data.
I dont have any data. It's weird that ToonComposer requirements.txt installs torch 2.8 for me even though they say that specifically this version of flash attn is needed.
I wonder if they mean that this one or newer, or specifically this version.
Just checked for you, it says >= which means this version or newer
In their directions they specifically grab the newest

The earliest version with Torch 2.8 builds is the absolute latest
OK! Thanks for the help