I am testing AdamW8bit quality with batch sizes 1, 2, 4, 6. I am seeing what is the best sweetspot f

I am testing AdamW8bit quality with batch sizes 1, 2, 4, 6. I am seeing what is the best sweetspot for quality vs speed of training
Was this page helpful?