one last advice - the end result would depend on balance and composition of training data set even more than on the process itself. Training set would represent the features you want the model to learn proportionally. The more features you want to train (body type, poses, facial expressions) the more diverce dataset you woudl need and more training steps it would take to reach optimum training point