I was able to train at 1.2 it/s with gradient checkpoint off and tweaking some other options, I thi

I was able to train at 1.2 it/s with gradient checkpoint off and tweaking some other options, I thing full bf16 training experimental and some other things I dont remember right now
Was this page helpful?