Highly Stylized Music Video
���������������� + ������������������ ����������
• The music is already fully recorded.
• I want to drive an avatar’s full body and facial performance using video of my real vocal, keyboard, and guitar performances.
• I want to generate multiple stylized versions of myself (alien, cyborg, etc.) using my trained Flux model. I trained it about a year ago, so I may need to retrain or fine-tune a newer model.
• I want to control a virtual choir, all singing the same part, using my own performance as the driver.
• All scenes should take place in alien or otherworldly environments.
• I want dynamic, cinematic camera movement throughout.
��������������
A year ago I paused this project because video generation was too limited. Now that ControlNet-style conditioning for video exists (e.g., LTX-2), I’m considering picking it back up—though getting these tools running is still a bit of a headache.
I’m a software engineer, so I’m totally comfortable with open-source tools — but I’m also willing to pay for a commercial solution (like Kling) if it meaningfully improves the workflow or final quality.
I’ve attached a short video clip to show the kind of result I was getting. (none of this is performance driven)
���� ����������������
If you were in my position today—with these goals, this hardware, and a trained model of yourself—what would your end‑to‑end workflow look like?
What tools, steps, and pipeline would you use to pull off a project like this in 2026?