Agree I made some test for speed/memory. No gradient checkpointing: xformers + Mem Efficient Attenti
Agree I made some test for speed/memory. No gradient checkpointing:
xformers + Mem Efficient Attention = xformers without Mem. E. attention
Same speed, same memory usage.
xformers + Mem Efficient Attention = xformers without Mem. E. attention
Same speed, same memory usage.
