Runpod•3mo ago

Pod 100% Mem usage freeze

Building wheels for collected packages: flash-attn Building wheel for flash-attn (setup.py) ... \ Gets stuck on the above when trying to run: pip install flash-attn==2.8.0.post2 --no-build-isolation Memory usage and CPU at 100% and following in system log: WARN: container is unhealthy: triggered memory limits (OOM) WARN: very high memory utilization: 57.74GiB / 57.74GiB (100 %)

20 Replies

Dj•3mo ago

That's to be expected, building flash attention takes a lot and you should prefer a prebuilt wheel. https://github.com/Dao-AILab/flash-attention/releases/tag/v2.8.3 Or change MAX_JOBS as directed by Flash Attention https://github.com/Dao-AILab/flash-attention#installation-and-features

GitHub

Release v2.8.3 · Dao-AILab/flash-attention

GitHub

GitHub - Dao-AILab/flash-attention: Fast and memory-efficient exact...

Fast and memory-efficient exact attention. Contribute to Dao-AILab/flash-attention development by creating an account on GitHub.

disintegralOP•3mo ago

Trying to install ToonComposer, on their github they say it requires a specific version, is it possible to use also with prebuilt wheel?

Dj•3mo ago

Yes, I just linked you the very latest Flash Attention - sorry I meant to give you the one for the version you selected

Dj•3mo ago

2.8.0.post2 https://github.com/Dao-AILab/flash-attention/releases/tag/v2.8.0.post2

GitHub

Release v2.8.0.post2 · Dao-AILab/flash-attention

disintegralOP•3mo ago

So just git pull instead then or

Dj•3mo ago

There are a lot of options and the names can look kind of scary but they're easily broken into readable parts You can download the wheel and pip install the wheel itself

disintegralOP•3mo ago

Slightly unsure on how to do that, but I can figure it out I am sure

Dj•3mo ago

curl -JLO <link> You can just right click copy link for the right wheel file :p

disintegralOP•3mo ago

Right, so then installing the wheel is pip install with what arguments?

Dj•3mo ago

None, just pip install flash_attn-2.8.0.post2+cu12torch2.5cxx11abiTRUE-cp312-cp312-linux_x86_64.whl (random file name, please make sure you grab the one for your setup)

disintegralOP•3mo ago

Damn, how am I supposed to know which one of these is the right one 😄

Dj•3mo ago

lol like I said it's a lot of data, just looking at the first one in the list: flash_attn-2.8.0.post2+cu12torch2.4cxx11abiFALSE-cp310-cp310-linux_x86_64.whl We can break it up into parts Everything before the + is the version, so we can skip up to that point. cu12torch2.X represents the CUDA and PyTorch version combination. You can get your pytorch version with python -c "import torch;print(torch.version.__version__)" On my computer this returns 2.7.1, which is just 2.7 here and that limits the options to just a few. cxx11abi then TRUE or FALSE is determined by the response of python -c "import torch;print(torch._C._GLIBCXX_USE_CXX11_ABI)" For me, this returns False The last bit cp-3xx depends on your python version, python -V For me this returns 3.12.8 which is just 312, this is repeated twice and all the wheels end in linux_x86_64.whl So putting it all together, I would need: flash_attn-2.8.0.post2+cu12torch2.7cxx11abiFALSE-cp312-cp312

disintegralOP•3mo ago

Thanks a lot for the help. So if the runpod instance has torch 2.8, i need to downgrade I assume. As this version of flash attn doesnt exist for torch 2.8

Dj•3mo ago

You could probably update flash attention depending on what you're doing, but downgrading would be easier if you don't have any data.

disintegralOP•3mo ago

I dont have any data. It's weird that ToonComposer requirements.txt installs torch 2.8 for me even though they say that specifically this version of flash attn is needed. I wonder if they mean that this one or newer, or specifically this version.

Dj•3mo ago

Just checked for you, it says >= which means this version or newer

Dj•3mo ago

In their directions they specifically grab the newest

Dj•3mo ago

https://github.com/Dao-AILab/flash-attention/releases/tag/v2.8.3

GitHub

Release v2.8.3 · Dao-AILab/flash-attention

Dj•3mo ago

The earliest version with Torch 2.8 builds is the absolute latest

disintegralOP•3mo ago

OK! Thanks for the help

Gaming

Programming

Pod 100% Mem usage freeze

Did you find this page helpful?