thanks, any feedback on my approach? cuz vision api is mostly for images not videos, ive not tried t
thanks, any feedback on my approach? cuz vision api is mostly for images not videos, ive not tried the other way round though, whether image2text (gpt4v) and text2image (dall-3) can be made consistent, with the new feature?