I'm training on an rtx 3090 with Gradient checkpointing off and CrossAttention sdpa at 1.52s/it

I'm training on an rtx 3090 with Gradient checkpointing off and CrossAttention sdpa at 1.52s/it
image.png
Was this page helpful?