These guys used LLaVa to caption an entire dataset https://pixart-alpha.github.io/

Was this page helpful?