Is it possible to record as WAV?

I'm currently working on a bot that listens to a user and kicks them from the voice chat if they say a banned word. Right now, it's working but a bit buggy. It currently saves a .pcm file when a user talks, and sends the filename to a python endpoint I'm hosting locally. Python converts to mp3 via ffmpeg and then transcribes the audio to text via openai's whisper library before returning whether or not to kick the user based on the trabscribed text. My issue is that the ffmpeg is causing a lot of latency, I was wondering if I can record straight to WAV instead since (as far as I understand) WAV is sort of like PCM with additional headers? If not possible not sure if anyone knows how hard it is to add those headers myself. Ultimately looking to reduce latency by getting rid of ffmpeg in the middle... Thanks!
7 Replies
d.js toolkit
d.js toolkit•3mo ago
!! S҉҉t̷̀̀a҉̷rs҉ţ̷̧a͜l̷̨͟k̢̡͡e̴r
Reading the docs for Whisper, you don't need to do the conversion yourself, since Whisper handles that for you. I'm not sure if that will affect your latency issue though
versaceplug
versaceplugOP•3mo ago
I actually used pythons wav library to convert pcm to wav which whisper can handle. It’s much less intensive than spawning subrocesses for ffmpeg to do the conversion lol
!! S҉҉t̷̀̀a҉̷rs҉ţ̷̧a͜l̷̨͟k̢̡͡e̴r
No like if you read the docs for whisper, you can pass any audio format that ffmpeg can handle Whisper uses ffmpeg internally for any conversions that need to be done
versaceplug
versaceplugOP•3mo ago
Yeah I had tried passing the straight up pcm file / data but was getting some sort of error- not at my computer now but can check again when I’m looking I should read the docs more though ur right Could be overcomplicating some of this
!! S҉҉t̷̀̀a҉̷rs҉ţ̷̧a͜l̷̨͟k̢̡͡e̴r
I'm assuming you've benchmarked what's actually taking time to process? Just trying to make certain that what I would assume is the most intensive part (massive AI model) isn't actually the thing slowing you down Especially depends where you're running it, Linux and Windows are kinda shit about exposing their GPU properly 😆
!! S҉҉t̷̀̀a҉̷rs҉ţ̷̧a͜l̷̨͟k̢̡͡e̴r
It was actually a discussion lol, the docs are mid https://github.com/openai/whisper/discussions/799
GitHub
What file formats are supported for input? · openai whisper · Dis...
Just wondering what file types are supported by the model. Are ogg vorbis files acceptable input?

Did you find this page helpful?