can I train a person and and object in the same training without captions?

can I train a person and and object in the same training without captions?
Was this page helpful?