Just be weary of the Facebook Dadaptation optimizers. They have very specific learning rate setting

Just be weary of the Facebook Dadaptation optimizers. They have very specific learning rate settings compared to the others, and especially
Adan Dadaptation
because it requires about 36GB of VRAM from my testing. The others weren't quite as intense though.
Was this page helpful?