Software Engineering Courses (SECourses)•11mo ago

not too bad but lower quality here

Furkan Gözükara SECoursesOP•2/6/25, 4:52 PM

Furkan Gözükara SECoursesOP•2/6/25, 4:52 PM

left ours right faster

Furkan Gözükara SECoursesOP•2/6/25, 4:52 PM

you see faster is overfit to get decent resemblance

Furkan Gözükara SECoursesOP•2/6/25, 4:53 PM

Furkan Gözükara SECoursesOP•2/6/25, 4:53 PM

Furkan Gözükara SECoursesOP•2/6/25, 4:54 PM

i will try 2x gpu next with 1024px

�𝗡𝗶𝘁𝗶𝗻 𝗣𝗮𝗿𝗲𝘃𝗮 Bro what you are doing is impressive I didn't quite understand your pixel chunks...

.tarkan•2/6/25, 5:16 PM

You can try out the model here. https://civitai.com/images/56138354

..tarkan You can try out the model here. https://civitai.com/images/56138354

�

𝗡𝗶𝘁𝗶𝗻 𝗣𝗮𝗿𝗲𝘃𝗮•2/6/25, 5:18 PM

Thanks but I want to know exactly how are you doing this

FFurkan Gözükara SECourses not too bad but lower quality here

Renzo•2/6/25, 6:19 PM

Yeah. The original one is excellent. How many images did you use?

RRenzo Yeah. The original one is excellent. How many images did you use?

Furkan Gözükara SECoursesOP•2/6/25, 9:10 PM

dataset shared here : https://www.patreon.com/posts/114972274

Patreon

Perfect Quality Example Model Training Images Dataset - Can Be Used...

Get more from SECourses: FLUX, Tutorials, Guides, Resources, Training, Scripts on Patreon

Furkan Gözükara SECoursesOP•2/6/25, 9:10 PM

28 images

Furkan Gözükara SECoursesOP•2/6/25, 9:10 PM

it is 170 th epoch

Maki99•2/6/25, 10:14 PM

hehe

�𝗡𝗶𝘁𝗶𝗻 𝗣𝗮𝗿𝗲𝘃𝗮 Thanks but I want to know exactly how are you doing this

.tarkan•2/6/25, 10:44 PM

I've written python scripts with chatGPT that will process the high resolution images for training by cutting them into 1024x1024 pieces with a bit of overlap.

..tarkan You can try out the model here. https://civitai.com/images/56138354

TheToday•2/6/25, 10:46 PM

Amazing, can you upload the model to tensor art please?

.tarkan•2/6/25, 10:46 PM

Sure!

..tarkan Sure!

TheToday•2/6/25, 10:46 PM

Thank you!

.tarkan•2/7/25, 1:10 AM

@TheToday Done! https://tensor.art/models/827274559054055826

Flux Sigma Vision Alpha 1 - 1 | Stable Diffusion Model - Checkpoint

0 runs, 0 stars, 0 downloads. This fine tuned checkpoint is based on Flux dev de-distilled thus requires a special comfyUI workflow and won't work very well ...

.tarkan•2/7/25, 1:10 AM

Little comparison to see the improvement side by side.

MMaki99 hehe

AIGambino•2/7/25, 1:45 AM

Is there a resolution limitation of this?

MMaki99 hehe

SirPoopsAlot•2/7/25, 2:32 AM

you recreated this? holy moly! do you have a breakdown on how you did this?

AAIGambino Is there a resolution limitation of this?

Maki99•2/7/25, 6:59 AM

not really the higher the resolution the longer it takes to render obviously - and you get better results but also diminishing returns eventually - this was resized to 1080p i believe took 5min to render

SSirPoopsAlot you recreated this? holy moly! do you have a breakdown on how you did this?

Maki99•2/7/25, 7:02 AM

i sent the workflow above - input video - play with the prompt and setting, make sure to match the skip steps with the normal scheduler step to equal 20 step of genning anything under that breaks teh genning

Maki99•2/7/25, 7:02 AM

and prompt basically everything betwheen the both videos i wish theres an llm that can do that for me to be more accurate

wardensc2•2/7/25, 7:25 AM

I already test some top Flux dev de-distilled model, quality quite dissapointed, most of them only good on realistic image, for the style only the same or even worse than Flux dev although time for render is double due to CFG >1. Are there any de-distilled models you guys can recommend and maybe can be used to train lora and finetuning ?

MMaki99 i sent the workflow above - input video - play with the prompt and setting, make...

SirPoopsAlot•2/7/25, 7:37 AM

Awesome, thanks a ton Maki

FFurkan Gözükara SECourses deep face labs model?

masood•2/7/25, 12:08 PM

yep

masood•2/7/25, 12:08 PM

and whats the best tool for lip sync?

Hael•2/7/25, 3:36 PM

What would you guys recommend for a flux checkpoint training on Kohya for a character model.. i have 40 images on a 3090ti (24 gb, 64 gb of ram). More than batch size 1? Is current best practice not to caption our training sets and to use ohxw woman? Things change so fast in this world?

AiInfluence•2/7/25, 6:01 PM

@Dr. Furkan Gözükara I'm currently using Flux Q8 on Kaggle , is there something better I can use for multi purpose? (realism,style etc). I saw in chat something about de distilled versions?

Mmasood and whats the best tool for lip sync?

Furkan Gözükara SECoursesOP•2/7/25, 9:03 PM

nothing i know specific

HHael What would you guys recommend for a flux checkpoint training on Kohya for a char...

Furkan Gözükara SECoursesOP•2/7/25, 9:03 PM

batch size 1 is best

Furkan Gözükara SECoursesOP•2/7/25, 9:03 PM

we have 24 gb config do fine tuning / Dreambooth

AAiInfluence @Dr. Furkan Gözükara I'm currently using Flux Q8 on Kaggle , is there something ...

Furkan Gözükara SECoursesOP•2/7/25, 9:04 PM

not that i know

Furkan Gözükara SECoursesOP•2/7/25, 9:07 PM

testing faster training on 8x A6000 machine

FFurkan Gözükara SECourses nothing i know specific

AiInfluence•2/8/25, 10:49 AM

any suggestions?

Timson•2/8/25, 12:29 PM

Is Hunyuan current state of the art img2vid model?

Timson•2/8/25, 12:31 PM

in both open-source and SaaS worlds?

ghstoic•2/8/25, 1:26 PM

Anyone know the easiest way to finetune your own VLLM for image captioning? I've already got my own dataset, but there doesn't seem to be a straightforward way to actually carry out the finetuning process itself... I know there's already some good captioners out there, but I want to finetune my own

Timson•2/8/25, 2:12 PM

@Dr. Furkan Gözükara I've noticed that SwarmUI downloads t5xxl_enconly.safetensors file every time you try to use flux even if you set t5xxl model location manually.
Furthermore there is a mention in debug logs that Comfy back-end is typecasting fp8_e4m3fn model into bf16 every time flux type checkpoint loads from scratch, and that typecasting takes significant CPU time.
I looked up in swarmUi code and it seems that it's downloading model from this repo: https://huggingface.co/mcmonkey/google_t5-v1_1-xxl_encoderonly/ That mdel is 4.9 gb that is consistent with VRAM usage of swarmUI when running flux inference.
My concern is that all this time we used the inferior fp8 casted t5xxl model for generations, wich according to many reports decreases output quality much more than casting to fp8 flux itself

mcmonkey/google_t5-v1_1-xxl_encoderonly · Hugging Face

Timson•2/8/25, 2:17 PM

It also checks model file hash so just replacing file with the same name would't work

Timson•2/8/25, 2:21 PM

I noticed it while researching why swapping fp16 flux models takes so much time. If my findings are correct it seems half the time is wasted on useless typecasting of t5xxl from fp8 to bf16

Timson•2/8/25, 2:26 PM

Notice your VRAM usage on 40+ GB vram cards while running flux fp16 inference. It should use at least 33.6 GB vram to fit both flux and t5xxl + clip, but it seems it is using around 30GB vram which is more consistent with downcasted t5xxl theory