depends i use different settings, but usually adafactor and the extra arguments with the gradient ch

depends i use different settings, but usually adafactor and the extra arguments with the gradient checkpointing and sometimes xformers not always.
Was this page helpful?