Which is the best way for image to text extraction? Current AI modesl only summarises the image.

I used llava-1.5-7b-hf but doesn't help.
Was this page helpful?