RunpodR
Runpod7mo ago
hotsnr

Unhealthy machines

We recently noticed that occasionally we get machines with bad performance - worker startup time is very long, and then runtime performance is really bad. We've seen it with and without Fastboot. We are going to do 2 things to address it:
  1. Crash worker before giving control back to the Runpod library if we detect bad performance.
  2. Remove bad workers with the control plane.
Is it expected for the tenant (us) to handle machine health issues? What would be the recommendation from the Runpod team?
Was this page helpful?