Yes, you can train the clip or T5 through captions

Yes, you can train the clip or T5 through captions
Was this page helpful?