I noticed you are using adafactor, what is your take on prodigy? the learning rate will adapt accord

I noticed you are using adafactor, what is your take on prodigy? the learning rate will adapt accordingly right? it might consume more vram but what do you think the result quality would be with prodigy?

I might be wrong but because the learning rate adapts, I think we might eliminate the risk of overfitting. Your advise would be helpful.
Was this page helpful?