R
Runpod•4d ago
disintegral

Pod 100% Mem usage freeze

Building wheels for collected packages: flash-attn Building wheel for flash-attn (setup.py) ... \ Gets stuck on the above when trying to run: pip install flash-attn==2.8.0.post2 --no-build-isolation Memory usage and CPU at 100% and following in system log: WARN: container is unhealthy: triggered memory limits (OOM) WARN: very high memory utilization: 57.74GiB / 57.74GiB (100 %)
20 Replies
Dj
Dj•4d ago
That's to be expected, building flash attention takes a lot and you should prefer a prebuilt wheel. https://github.com/Dao-AILab/flash-attention/releases/tag/v2.8.3 Or change MAX_JOBS as directed by Flash Attention https://github.com/Dao-AILab/flash-attention#installation-and-features
GitHub
GitHub - Dao-AILab/flash-attention: Fast and memory-efficient exact...
Fast and memory-efficient exact attention. Contribute to Dao-AILab/flash-attention development by creating an account on GitHub.
disintegral
disintegralOP•4d ago
Trying to install ToonComposer, on their github they say it requires a specific version, is it possible to use also with prebuilt wheel?
Dj
Dj•4d ago
Yes, I just linked you the very latest Flash Attention - sorry I meant to give you the one for the version you selected
disintegral
disintegralOP•4d ago
So just git pull instead then or
Dj
Dj•4d ago
There are a lot of options and the names can look kind of scary but they're easily broken into readable parts You can download the wheel and pip install the wheel itself
disintegral
disintegralOP•4d ago
Slightly unsure on how to do that, but I can figure it out I am sure
Dj
Dj•4d ago
curl -JLO <link> You can just right click copy link for the right wheel file :p
disintegral
disintegralOP•4d ago
Right, so then installing the wheel is pip install with what arguments?
Dj
Dj•4d ago
None, just pip install flash_attn-2.8.0.post2+cu12torch2.5cxx11abiTRUE-cp312-cp312-linux_x86_64.whl (random file name, please make sure you grab the one for your setup)
disintegral
disintegralOP•4d ago
Damn, how am I supposed to know which one of these is the right one 😄
Dj
Dj•4d ago
lol like I said it's a lot of data, just looking at the first one in the list: flash_attn-2.8.0.post2+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl We can break it up into parts Everything before the + is the version, so we can skip up to that point. cu12torch2.X represents the CUDA and PyTorch version combination. You can get your pytorch version with python -c "import torch;print(torch.version.__version__)" On my computer this returns 2.7.1, which is just 2.7 here and that limits the options to just a few. cxx11abi then TRUE or FALSE is determined by the response of python -c "import torch;print(torch._C._GLIBCXX_USE_CXX11_ABI)" For me, this returns False The last bit cp-3xx depends on your python version, python -V For me this returns 3.12.8 which is just 312, this is repeated twice and all the wheels end in linux_x86_64.whl So putting it all together, I would need: flash_attn-2.8.0.post2+cu12torch2.7cxx11abiFALSE-cp312-cp312
disintegral
disintegralOP•4d ago
Thanks a lot for the help. So if the runpod instance has torch 2.8, i need to downgrade I assume. As this version of flash attn doesnt exist for torch 2.8
Dj
Dj•4d ago
You could probably update flash attention depending on what you're doing, but downgrading would be easier if you don't have any data.
disintegral
disintegralOP•4d ago
I dont have any data. It's weird that ToonComposer requirements.txt installs torch 2.8 for me even though they say that specifically this version of flash attn is needed. I wonder if they mean that this one or newer, or specifically this version.
Dj
Dj•4d ago
Just checked for you, it says >= which means this version or newer
Dj
Dj•4d ago
In their directions they specifically grab the newest
No description
Dj
Dj•4d ago
The earliest version with Torch 2.8 builds is the absolute latest
disintegral
disintegralOP•4d ago
OK! Thanks for the help

Did you find this page helpful?