although BLIP makes the process much faster, in my opinion it is less accurate. Sometime it produces wrong descriptions for the images. This might affect the training right? I have not tried GPT-4's image description feature yet, but from what I heard, it is quite accurate. maybe a comparison between BLIP and GPT-4 is needed