Hey <@205854764540362752> . Been trying training in Runpod, I went with the pytorch 2.2 as explained
Hey @Furkan Gözükara SECourses . Been trying training in Runpod, I went with the pytorch 2.2 as explained in runpod installation in kohya v16 script of yours. Twice in a row the training has stopped, first it saved epoch 10, then stopped at epoch 20. i started new training using my saved epoch 10 checkpoint, now it proceeded to epoch 50, then I got this same error: (weird, because I selected 400 GB harddisk):
"aving checkpoint: /workspace/final_training_models/model/bodylora_head-000040.safetensors train_util.py:5664
Using memory efficient save file: /workspace/final_training_models/model/bodylora_head-000040.safetensors
Traceback (most recent call last):
File "/workspace/kohya_ss/sd-scripts/flux_train.py", line 849, in <module>
train(args)
[...]
mem_eff_save_file(state_dict, ckpt_path, metadata=sai_metadata)
File "/workspace/kohya_ss/sd-scripts/library/utils.py", line 260, in mem_eff_save_file
v.contiguous().view(torch.uint8).numpy().tofile(f)
OSError: 94371840 requested and 58861824 written
steps: 42%|████████████████████████████████████████████████████▌ | 200/472 [1:47:08<2:25:42, 32.14s/it, avr_loss=0.314]
Traceback (most recent call last):
File "/workspace/kohya_ss/venv/bin/accelerate", line 8, in <module>
sys.exit(main())"
"aving checkpoint: /workspace/final_training_models/model/bodylora_head-000040.safetensors train_util.py:5664
Using memory efficient save file: /workspace/final_training_models/model/bodylora_head-000040.safetensors
Traceback (most recent call last):
File "/workspace/kohya_ss/sd-scripts/flux_train.py", line 849, in <module>
train(args)
[...]
mem_eff_save_file(state_dict, ckpt_path, metadata=sai_metadata)
File "/workspace/kohya_ss/sd-scripts/library/utils.py", line 260, in mem_eff_save_file
v.contiguous().view(torch.uint8).numpy().tofile(f)
OSError: 94371840 requested and 58861824 written
steps: 42%|████████████████████████████████████████████████████▌ | 200/472 [1:47:08<2:25:42, 32.14s/it, avr_loss=0.314]
Traceback (most recent call last):
File "/workspace/kohya_ss/venv/bin/accelerate", line 8, in <module>
sys.exit(main())"


