OSError: [Errno 5] Input/output error

The model training stopped in the middle of the night with I/O error, apparently it is due to physical disk problem, and i tested it is randomly occur. the consequence is that make my pod idling for at least 6 hours, and i paid for it.

  1. How to stop it happen again?
  2. Can i claim it back for those idling hours?
I can provide you the log and my pod number
Screenshot_2024-03-04_at_8.11.58_AM.png
Was this page helpful?