RunpodR
Runpod2y ago
2 replies
caseus

Linux kernel version is 5.4.0

per accelerate:

https://github.com/huggingface/accelerate/blob/85a75d4c3d0deffde2fc8b917d9b1ae1cb580eb2/src/accelerate/utils/other.py#L314C1-L331C1

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


basically H100's are currently unusable as it hangs for me using accelerate to train models.
GitHub
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision - huggingface/accelerate
accelerate/src/accelerate/utils/other.py at 85a75d4c3d0deffde2fc8b9...
Was this page helpful?