R
Runpod2y ago
esho

Faster Whisper Latency is High

I test a 10-second audio , and i get latency about 1 second on RTX4090 after cold start. The default is base model, and on my own RTX3090, the latency is about 0.2s.
11 Replies
Unknown User
Unknown User2y ago
Message Not Public
Sign In & Join Server To View
esho
eshoOP2y ago
"import time start = time.time() response = requests.post(url, json=payload, headers=headers) print("Time taken: ", time.time() - start)" a very simple scirpt, and there is "executionTime" in the respone. "executionTime" is about 800ms.
Unknown User
Unknown User2y ago
Message Not Public
Sign In & Join Server To View
esho
eshoOP2y ago
it's from my PC. I also tested it on GCP
Unknown User
Unknown User2y ago
Message Not Public
Sign In & Join Server To View
esho
eshoOP2y ago
the executionTime is in the response, about 800ms. I think this is also high.
Unknown User
Unknown User2y ago
Message Not Public
Sign In & Join Server To View
digigoblin
digigoblin2y ago
800ms is pretty quick actually
Unknown User
Unknown User2y ago
Message Not Public
Sign In & Join Server To View
esho
eshoOP2y ago
I am using the default config. I think it should run as fast as the local machine. Although it is called serverless, only me is using the server after cold start. This should be really fast. I am using RTX3090 and RTX4090.
Unknown User
Unknown User2y ago
Message Not Public
Sign In & Join Server To View

Did you find this page helpful?