We all know from **Dr. Furkan Gözükara's work** that it typically takes 15 images, 3000 steps, and a

We all know from Dr. Furkan Gözükara's work that it typically takes 15 images, 3000 steps, and a learning rate of 4e-6 to train a model to capture a face. However, training something as complex as an alternate futuristic reality is an entirely different challenge. The dataset size, steps, and learning rate required for this are much more demanding.

In my case, with a dataset of 500 images for stage 1 and 1000 images for stage 2, I initially started with a low learning rate (4e-6), fearing overfitting. However, after running into issues like underfitting and wasting valuable resources, I learned an important lesson: starting with a high learning rate (like 4e-5) allows you to capture the core of the concept much faster, even if it overfits. After that, you can restart with a lower learning rate to fine-tune the details.

In fact, starting with a low learning rate to avoid overfitting ended up costing me both time and money. It takes longer to realize you're underfitting, and by the time you do, you've already invested significant resources. Allowing the model to overfit initially and then restarting with a lower LR is much more cost-effective and efficient.

That said, I’m still open to learning if this insight can be further optimized. I tend to favor restarting rather than fine-tuning because of my perfectionist tendencies, but there may be a balance that I’m missing. For now, the strategy of beginning with a high LR and adjusting downwards has proven to be the most effective approach for me.
Was this page helpful?