Hello everyone. I am Dr. Furkan Gรถzรผkara. PhD Computer Engineer. SECourses is a dedicated YouTube channel for the following topics : Tech, AI, News, Science, Robotics, Singularity, ComfyUI, SwarmUI, ML, Artificial Intelligence, Humanoid Robots, Wan 2.2, FLUX, Krea, Qwen Image, VLMs, Stable Diffusion
Got error on start of training, I have not gotten this error before. done 20 trainings and no problems starting: "024-12-09 12:23:35 INFO Loading state dict from flux_utils.py:330 /home/Ubuntu/Downloads/t5xxl_fp16 .safetensors Traceback (most recent call last): File "/home/Ubuntu/apps/kohya_ss/sd-scripts/library/utils.py", line 366, in load_safetensors state_dict = load_file(path, device=device) File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/safetensors/torch.py", line 313, in load_file with safe_open(filename, framework="pt", device=device) as f: safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py", line 849, in <module> train(args) File "/home/Ubuntu/apps/kohya_ss/sd-scripts/flux_train.py", line 222, in train t5xxl = flux_utils.load_t5xxl(args.t5xxl, weight_dtype, "cpu", args.disable_mmap_load_safetensors) File "/home/Ubuntu/apps/kohya_ss/sd-scripts/library/flux_utils.py", line 331, in load_t5xxl sd = load_safetensors(ckpt_path, device=str(device), disable_mmap=disable_mmap, dtype=dtype) File "/home/Ubuntu/apps/kohya_ss/sd-scripts/library/utils.py", line 368, in load_safetensors state_dict = load_file(path) # prevent device invalid Error File "/home/Ubuntu/apps/kohya_ss/venv/lib/python3.10/site-packages/safetensors/torch.py", line 313, in load_file "
CHat GPT O1-preview gave this answer: "Yes, you can run the two GPUs separately, treating each as its own device. Even though your RTX A6000 GPUs are connected via NVLinkโwhich enables high-bandwidth peer-to-peer communication and memory access between themโthey still present themselves as two distinct GPU devices to the system and to frameworks like PyTorch or TensorFlow"