Hi, are websockets available for the new
Hi, are websockets available for the new Nova 3 ASR model?
4 Replies
Hey @Tim -- following up on this. Yes the nova 3 model supports websockets. It's a little involved right now, you basically call the
run
method with an option websocket: true
and it returns a websocket. But to make it work you need to send the raw audio bytes over the socket. There are too many details there I'm not familiar with, so I'd recommend waiting for the official docsOh, there's actually a pretty complete example here: https://github.com/cloudflare/realtime-examples/tree/main/ai-tts-stt that includes using nova3 with websockets for streaming speech to text. e.g. here's where the websocket connection is initialized: https://github.com/cloudflare/realtime-examples/blob/main/ai-tts-stt/src/stt-adapter.ts#L634-L648
GitHub
realtime-examples/ai-tts-stt at main · cloudflare/realtime-examples
Contribute to cloudflare/realtime-examples development by creating an account on GitHub.
GitHub
realtime-examples/ai-tts-stt/src/stt-adapter.ts at main · cloudfla...
Contribute to cloudflare/realtime-examples development by creating an account on GitHub.
I use deepgram directly, but if it's available via cloudflare, it means potenitally lower latency between my worker and nova-3 - we send audio via bytes. This is good.
Oh, the first option looks like it's connection via URL and a fetch call instead of the env.AI.run
But, I get the idea. If the websockets are already supported via the env.[AI].run and it returns a websocket instance, I can play with that.
Yep! If you already have raw bytes that's great, you've probably already solved most of the problems.
I think one thing to be aware of: you'll probably need to send the encoding + sample rate as params. If you're doing that via the binding it should be something like:
I'd love to hear how you get on! Especially with the low-latency realtime part. That's one of the things our team is most excited about.
We're running nova in a bunch of different regions, so yes I would expect you to see much lower latency.