Agree I made some test for speed/memory. No gradient checkpointing: xformers + Mem Efficient Attenti

Agree I made some test for speed/memory. No gradient checkpointing:
xformers + Mem Efficient Attention = xformers without Mem. E. attention
Same speed, same memory usage.
Was this page helpful?