File "/opt/conda/bin/accelerate", line 8, in <module> sys.exit(main()) File "/opt/conda/lib/py

File "/opt/conda/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 977, in launch_command
multi_gpu_launcher(args)
File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 646, in multi_gpu_launcher
distrib_run.run(args)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
=====================================================
./sdxl_train.py FAILED
-----------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
-----------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-11-27_18:01:49
host : c4565fefbe4d
rank : 1 (local_rank: 1)
exitcode : -6 (pid: 1053)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 1053
=====================================================
Was this page helpful?