Ensuring Task Routing to Warm Workers for FlashBoot VRAM Persistence
Hi team,
I’m using FlashBoot and my understanding is that the container should stay alive after a job finishes so that the model remains loaded in VRAM, reducing cold start time.
However, after I activate worker A, the next task often gets scheduled on worker B instead. This defeats the purpose of FlashBoot because the preloaded model in worker A is never reused.
Question:
Is there any way to prioritize scheduling new tasks onto an already-active FlashBoot worker? Or force tasks onto the worker that already has the model loaded?
This is crucial for minimizing cold starts.
Thanks!


1 Reply