aight ill do one gpu first under rank 3 then wait for ur vid```python with torch.enable_grad(), de
aight ill do one gpu first under rank 3 then wait for ur vid
with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs): # type: ignore[attr-defined]
steps: 0%|▍ | 28/6500 [09:25<36:18:15, 20.19s/it, avr_loss=0.207]