🚨 Inconsistent Execution Time Across Workers for Same Input on L40s (48GB Pro) – Need Help

Hi everyone, I'm facing a strange issue with my RunPod endpoint set up using latentsync on L40s 48GB Pro with 10 workers. The problem is that the same input request is taking vastly different execution times across different workers. - Some workers complete the task in 10–15 minutes - Others take up to 1 hour for the exact same input This inconsistency is severely impacting performance and reliability. I've ensured that: - The input is exactly the same - There are no extra processes or resource-heavy tasks running - Model/environment is the same across all workers Has anyone experienced this before? Could it be a hardware-related issue, resource throttling, or something at the container level? Would really appreciate any insights or help from the community or the RunPod team! Thanks in advance!
3 Replies
Jason
Jasonβ€’3w ago
Maybe you can open a support ticket to let support check the machines, but if its normal you'd have to monitor the gpu performance more to see if its from the host or from your app
Poddy
Poddyβ€’3w ago
@Himanshu Kotkar
Escalated To Zendesk
The thread has been escalated to Zendesk!
Jason
Jasonβ€’3w ago
any endpoint id?

Did you find this page helpful?