Ensuring Task Routing to Warm Workers for FlashBoot VRAM Persistence

Hi team, I’m using FlashBoot and my understanding is that the container should stay alive after a job finishes so that the model remains loaded in VRAM, reducing cold start time. However, after I activate worker A, the next task often gets scheduled on worker B instead. This defeats the purpose of FlashBoot because the preloaded model in worker A is never reused. Question: Is there any way to prioritize scheduling new tasks onto an already-active FlashBoot worker? Or force tasks onto the worker that already has the model loaded? This is crucial for minimizing cold starts. Thanks!
No description
No description
1 Reply
Poddy
Poddy2d ago

Did you find this page helpful?