The doubt arises because the text encoder is in a different file, and if I train it, would Koha save
The doubt arises because the text encoder is in a different file, and if I train it, would Koha save it as a separate file? In that case, should I also use my custom clip_l for inference? Ideally, the text encoders should be included in the same SafeTensor file, as happens in SDXL



