ok
So pick 300 quite good images that are unique and different.
create a model where you use the original 12 images that you used
12 images PLUS the 300 images WITH filewords.
This should not overtrain the model and because of compounded weight it should DECREASE the variation of the loss/average learning rate