RunpodR
Runpod2mo ago
gufisha

Serverless

We have a serverless endpoint for lora training and in some instances it takes more then 24 hours. But we experienced some of workers just shut down around 23-24 hours. Is there a fix for this? We can have workers run 32-48 hours thats not a problem, but because we need to wait for training to complete before uploading files, we are wasting money, so we need to ensure they will stay alive. Can someone please help with this?
Was this page helpful?