I am using serverless endpoint with H100 but I am experiencing high queue time .If you send a single request to runpod enpoint you may get 2 seconds delay time and on same 2nd request you will get queue time of 7 seconds which should not happend.I think they should optimize their queue and worker communication codes ist run: 3 seconds 2nd run: 15.84 seconds
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!