Runpod•2y ago

Faster Whisper Latency is High

I test a 10-second audio , and i get latency about 1 second on RTX4090 after cold start. The default is base model, and on my own RTX3090, the latency is about 0.2s.

Jason•4/12/24, 5:37 PM

Hi there just wondering how did you benchmark those

eshoOP•4/12/24, 6:50 PM

"import time
start = time.time()
response = requests.post(url, json=payload, headers=headers)
print("Time taken: ", time.time() - start)"

a very simple scirpt, and there is "executionTime" in the respone. "executionTime" is about 800ms.

Eesho "import time start = time.time() response = requests.post(url, json=payload, hea...

Jason•4/12/24, 11:23 PM

Oh is this from your pc?

Jason•4/12/24, 11:23 PM

Or is that on your handler code?

eshoOP•4/14/24, 2:58 AM

it's from my PC.

eshoOP•4/14/24, 2:58 AM

I also tested it on GCP

Eesho it's from my PC.

Jason•4/14/24, 3:13 AM

it may be the network latency+execution time

eshoOP•4/14/24, 4:11 PM

the executionTime is in the response, about 800ms. I think this is also high.

Jason•4/14/24, 4:15 PM

what config (inputs) do you use

Jason•4/14/24, 4:21 PM

its pretty average i think yeah

Jason•4/14/24, 4:21 PM

and what gpu are you using too

digigoblin•4/14/24, 4:32 PM

800ms is pretty quick actually

Ddigigoblin 800ms is pretty quick actually

Jason•4/14/24, 4:34 PM

yeah pretty avg right

Jason•4/14/24, 4:35 PM

depends on what config hes using too

eshoOP•4/14/24, 6:00 PM

I am using the default config. I think it should run as fast as the local machine.

eshoOP•4/14/24, 6:02 PM

Although it is called serverless, only me is using the server after cold start. This should be really fast.

eshoOP•4/14/24, 6:03 PM

I am using RTX3090 and RTX4090.

Eesho I am using RTX3090 and RTX4090.

Jason•4/15/24, 3:30 AM

Hmm yeah makes sense

Eesho Although it is called serverless, only me is using the server after cold start. ...

Jason•4/15/24, 3:30 AM

It will be if your requests keep coming I think

Jason•4/15/24, 3:39 AM

I don't know yet but maybe try another longer audio maybe it will be faster

Faster Whisper Latency is High

Similar Threads

Similar Threads

Similar Threads