Unhealthy machines
We recently noticed that occasionally we get machines with bad performance - worker startup time is very long, and then runtime performance is really bad. We've seen it with and without Fastboot. We are going to do 2 things to address it:
- Crash worker before giving control back to the Runpod library if we detect bad performance.
- Remove bad workers with the control plane.