Software Engineering Courses (SECourses)•13mo ago

ok thank you Dr.

�🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭ok thank you Dr.

Furkan Gözükara SECourses•12/9/24, 12:07 PM

i am gonna make a fix

�

🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭OP•12/9/24, 12:10 PM

I am missing the ae.safetensors for flux dev, is this the right one? https://huggingface.co/black-forest-labs/FLUX.1-schnell/blob/main/ae.safetensors says flux schnell, but the vae is the same, ye?

ae.safetensors · black-forest-labs/FLUX.1-schnell at main

�

🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭OP•12/9/24, 12:15 PM

tried use the download models again, but it starts downloading flux dev etc. all over again

�🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭I am missing the ae.safetensors for flux dev, is this the right one? https://hug...

Furkan Gözükara SECourses•12/9/24, 12:32 PM

yes

�

🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭OP•12/9/24, 12:37 PM

Got error on start of training, I have not gotten this error before. done 20 trainings and no problems starting:
"024-12-09 12:23:35 INFO Loading state dict from flux_utils.py:330
/home/Ubuntu/Downloads/t5xxl_fp16
.safetensors
Traceback (most recent call last):
File "/home/Ubuntu/apps/kohya_ss/sd-scripts/library/utils.py", line 366, in load_safetensors
state_dict = load_file(path, device=device)
File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/safetensors/torch.py", line 313, in load_file
with safe_open(filename, framework="pt", device=device) as f:
safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py", line 849, in <module>
train(args)
File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py", line 222, in train
t5xxl = flux_utils.load_t5xxl(args.t5xxl, weight_dtype, "cpu", args.disable_mmap_load_safetensors)
File "/home/Ubuntu/apps/kohya_ss/sd-scripts/library/flux_utils.py", line 331, in load_t5xxl
sd = load_safetensors(ckpt_path, device=str(device), disable_mmap=disable_mmap, dtype=dtype)
File "/home/Ubuntu/apps/kohya_ss/sd-scripts/library/utils.py", line 368, in load_safetensors
state_dict = load_file(path) # prevent device invalid Error
File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/safetensors/torch.py", line 313, in load_file
"

�

🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭OP•12/9/24, 12:38 PM

found it I think, the t5xxl was 9.0 GB, that means it hadn't completed the download...

�🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭Got error on start of training, I have not gotten this error before. done 20 tra...

Furkan Gözükara SECourses•12/9/24, 12:40 PM

fixed batch pre processor

Furkan Gözükara SECourses•12/9/24, 12:40 PM

model download fgailed

Furkan Gözükara SECourses•12/9/24, 12:40 PM

yes you found accurately

�

🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭OP•12/9/24, 12:41 PM

Thank you Dr. Great work as always

�

🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭OP•12/9/24, 12:55 PM

Can I use the option RTX A6000 NVLink with 2gpus and run 2 trainings at once?

�🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭Can I use the option RTX A6000 NVLink with 2gpus and run 2 trainings at once?

Furkan Gözükara SECourses•12/9/24, 12:59 PM

yes for lora

Furkan Gözükara SECourses•12/9/24, 12:59 PM

for fine tune i didnt test probably OOM

�

🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭OP•12/9/24, 1:00 PM

OOM? it's 48 GB each

�🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭OOM? it's 48 GB each

Furkan Gözükara SECourses•12/9/24, 1:02 PM

yes

Furkan Gözükara SECourses•12/9/24, 1:02 PM

fine tuning need 80 gb gpus

Furkan Gözükara SECourses•12/9/24, 1:02 PM

when you do multi gpu

�

🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭OP•12/9/24, 1:02 PM

I'm talking two separate trainings

�

🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭OP•12/9/24, 1:02 PM

on same VM

�

🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭OP•12/9/24, 1:02 PM

I already did that with 2 a6000 before, so that works, but I'm unsure what the NVlink means

�🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭I'm talking two separate trainings

Furkan Gözükara SECourses•12/9/24, 1:35 PM

seperate works

Furkan Gözükara SECourses•12/9/24, 1:35 PM

nvlink used when multiple gpu does a thing together

�

🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭OP•12/9/24, 1:36 PM

CHat GPT O1-preview gave this answer: "Yes, you can run the two GPUs separately, treating each as its own device. Even though your RTX A6000 GPUs are connected via NVLink—which enables high-bandwidth peer-to-peer communication and memory access between them—they still present themselves as two distinct GPU devices to the system and to frameworks like PyTorch or TensorFlow"

Furkan Gözükara SECourses•12/9/24, 1:37 PM

yep

�

🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭OP•12/9/24, 2:54 PM

If I have 1.jpg 1.txt 2.jpg 2.txt, the captions will automatically be read from the txt files in kohya, yes?

�

🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭OP•12/9/24, 7:51 PM

is it possible to convert an sdxl lora to a flux lora?

�🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭If I have 1.jpg 1.txt 2.jpg 2.txt, the captions will automatically be read from ...

Furkan Gözükara SECourses•12/9/24, 9:18 PM

yes

�🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭is it possible to convert an sdxl lora to a flux lora?

Furkan Gözükara SECourses•12/9/24, 9:19 PM

nope

Zhigarev•12/10/24, 9:54 AM

Can you make a LoRA from a fine-tuned model? That is, extract "ohwx man" from it to run tests? Is there any comparison in quality?

ZZhigarev Can you make a LoRA from a fine-tuned model? That is, extract "ohwx man" from it...

Furkan Gözükara SECourses•12/10/24, 10:05 AM

yes

Furkan Gözükara SECourses•12/10/24, 10:05 AM

https://www.patreon.com/posts/how-to-extract-112335162

Patreon

How to Extract LoRA from FLUX Fine Tuning / DreamBooth Training Ful...

Get more from SECourses: FLUX, Tutorials, Guides, Resources, Training, Scripts on Patreon

�

🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭OP•12/10/24, 11:29 AM

doing sdxl training on kaggle free, if your computer goes to sleep, will that stop the training?

�🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭doing sdxl training on kaggle free, if your computer goes to sleep, will that st...

Furkan Gözükara SECourses•12/10/24, 10:03 PM

yep

Furkan Gözükara SECourses•12/10/24, 10:03 PM

kaggle page has to stay open

Furkan Gözükara SECourses•12/10/24, 10:03 PM

and selected

�

🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭OP•12/10/24, 10:42 PM

oh wow, has to be active tab too? insane

�🍭🎀 𝒜𝓋𝒶 𝐹𝓇𝒾𝑔𝑔 🎀🍭oh wow, has to be active tab too? insane 😄

Furkan Gözükara SECourses•12/10/24, 10:44 PM

i would leave that way :d

A_R_T•12/11/24, 9:03 AM

hey guys ! ANy tips on fastest way to copy my 46GB safetensor files from masscompute server to my local drive. I want to use in comfyui

AA_R_T hey guys ! ANy tips on fastest way to copy my 46GB safetensor files from masscom...

cyleap•12/11/24, 9:23 AM

i upload to hf or any cloud. then download them later

https://medium.com/@yushantripleseven/uploading-from-jupyter-to-huggingface-ccf352f1e049

macba•12/11/24, 10:07 AM

Hey Guys, Doctor… When training for a person or character (LoRA or finetune), what to do of training images where our subject is very well visible, nice resolution, nice lighting, etc… but… there's another person next to them?

1. Do not use the image for training
2. "Censor" the other person with a colored rectangle/circle using a color that is not in the general palette of the image
3. Crop the image, even if it means having a very weird form factor (e.g. very narrow and tall picture)

AA_R_T hey guys ! ANy tips on fastest way to copy my 46GB safetensor files from masscom...

Furkan Gözükara SECourses•12/11/24, 11:39 AM

hugging face

Mmacba Hey Guys, Doctor… When training for a person or character (LoRA or finetune), wh...

Furkan Gözükara SECourses•12/11/24, 11:39 AM

dont use

Furkan Gözükara SECourses•12/11/24, 11:39 AM

i dont use 2 person unless you are training 2 person at the same time

FFurkan Gözükara SECourses dont use

macba•12/11/24, 6:22 PM

Thank you doc… May I also ask if you or anyone else here have ever resumed a training from Hugging Face in Kohya_SS? My training died as the disk went out of space, but I do have the saved states on HF

Mmacba Thank you doc… May I also ask if you or anyone else here have ever resumed a tra...

Furkan Gözükara SECourses•12/11/24, 8:20 PM

i never tried

macba•12/11/24, 8:44 PM

Anyways, I've been lucky enough to get a few credits on TensorDock and I've been trying your presets for Dreambooth finetuning on 1 and 2x H100's

macba•12/11/24, 8:46 PM

Basically, you get 1.68 s/it on 1 and 2.36 s/it on the 2x system [fixed it/s → s/it]

macba•12/11/24, 8:46 PM

Correct me if I'm wrong but this is better-than-linear scaling, right? I mean, it takes far less than 2x 1.68s (3.36 s) to do a step

macba•12/11/24, 8:47 PM

BTW I don't know if it happens to you too, but Kohya_ss ends training with a bunch of errors even when everything went fine… Pretty scary…

ok thank you Dr.

Similar Threads

Similar Threads

Similar Threads