Software Mansion•5w ago

Echo implementing membrane openai example in phoenix liveview

Hi there! I tried to implement the openai realtime example available here, trying to integrate it into a liveview example app. As for integrating membrane in liveview I followed this example https://github.com/membraneframework/membrane_demo/tree/master/webrtc_live_view In my implementation, what happens is that the audio is captured and is sent to openAI, because I can receive the response. But the problem is that together with the response I receive also all the audio sent as input from mediaCapture (i.e. Echo). Do you know what could cause this issue? this is the liveview implementation: pento/lib/pento_web/live/call.ex at main · tomashco/pento · GitHub this is the pipeline that gets called: https://github.com/tomashco/pento/blob/main/lib/pento/call/pipeline.ex this is the integration with the openai endpoint: pento/lib/openai_endpoint.ex at main · tomashco/pento · GitHub this is the integration of the websocket: pento/lib/openai_websocket.ex at main · tomashco/pento · GitHub I’m pretty sure the problem lays in the pipeline, but I don’t understand what is wrong. thank you to everyone can help!

16 Replies

Mateusz Front•5w ago

Hi @tomashco, briefly looking at the pipeline and liveview, they seem fine (except that you have both audio and video set to false in the capture https://github.com/tomashco/pento/blob/main/lib/pento_web/live/call.ex#L25)

GitHub

pento/lib/pento_web/live/call.ex at main · tomashco/pento

Contribute to tomashco/pento development by creating an account on GitHub.

Mateusz Front•5w ago

what may happen is that the output from openAI is captured by your mic and sent again, but as I understand it's not the case you can try separating the input and output and opening them on different devices, to make sure that something else isn't causing the echo

tomashcoOP•5w ago

yes having both audio and video false in the capture is a typo, this is the actual configuration:

socket
        |> Membrane.WebRTC.Live.Capture.attach(
          id: "mediaCapture",
          signaling: ingress_signaling,
          video?: false,
          audio?: true,
          preview?: true
        )

socket
        |> Membrane.WebRTC.Live.Capture.attach(
          id: "mediaCapture",
          signaling: ingress_signaling,
          video?: false,
          audio?: true,
          preview?: true
        )

For the test I just did, I removed the audioPlayer in the render:

    <%!-- <Membrane.WebRTC.Live.Player.live_render socket={@socket} player_id="audioPlayer" /> --%>

    <%!-- <Membrane.WebRTC.Live.Player.live_render socket={@socket} player_id="audioPlayer" /> --%>

and I can still hear the echo. So I think is the mediaCapture that returns back the audio?

Mateusz Front•5w ago

Now you have preview? set to true, so it may be it 😉

tomashcoOP•5w ago

unfortunately also disabling preview is the same. Should I try to check the implementation of Membrane.WebRTC.Live.Capture.live_render component?

Mateusz Front•5w ago

Please do, maybe disabling the preview doesn't work 🤔

tomashcoOP•5w ago

Probably i found the error, in capture.js it starts playing by default. by commenting this line I don't hear the echo anymore and it's now working as expected!

Mateusz Front•5w ago

great! we’ll look into it, cc @Feliks

tomashcoOP•5w ago

may I ask you if you have any example showing STT and TTS made using membrane?

Mateusz Front•4w ago

Here's an example of STT with Boombox: https://github.com/membraneframework/boombox/blob/master/examples.livemd#read-speech-audio-from-mp4-chunk-by-chunk-generate-transcription

GitHub

boombox/examples.livemd at master · membraneframework/boombox

Boombox is a simple streaming tool built on top of Membrane - membraneframework/boombox

Mateusz Front•4w ago

you can wrap it into a membrane element too

tomashcoOP•4w ago

super, I'll try that thank you!

Mateusz Front•4w ago

Regarding TTS, I don't know any good OSS model

samrat•4w ago

For TTS, you can check out Kokoro. I think there are better models now but maybe a good starting point: https://github.com/samrat/kokoro/blob/main/kokoro_player.livemd

GitHub

kokoro/kokoro_player.livemd at main · samrat/kokoro

Elixir bindings to Kokoro-82M text-to-speech model - samrat/kokoro

Feliks•4w ago

@tomashco what browser have you used? Do you have any demo that I could use to reproduce your issue, without a need to have a OPENAI_API_KEY? I have created a PR that is supposed to fix your issue, but when I run our demo I cannot hear any echo, so it is hard for me to test it https://github.com/membraneframework/membrane_webrtc_plugin/pull/31

GitHub

Fix bug with echo from preview by FelonEkonom · Pull Request #31 ...

tomashcoOP•4w ago

hi @Feliks I just pushed a commit in which I just comment out the live audioPlayer and the part of the pipeline which calls openai. It is true that now the pipeline returns back what it hears, but I don't mount the audioPlayer component in page so it shouldn't be a problem https://github.com/tomashco/pento/commit/c24df8c17825bb405c23ee57cbf65f36c35fe2ae I also retested uncommenting the line // this.el.play(); and I can hear the echo. While I stop hearing it the moment I comment again the line I'm using Firefox 142.0.1 (aarch64) but I've the same problem on Chrome Version 139.0.7258.155 (Official Build) (arm64) I'm using a Macbook air M3 let me know if I can be of any help! thank you!

GitHub

chore: comment openai pipeline part to test echo · tomashco/pento@...

Gaming

Programming

Echo implementing membrane openai example in phoenix liveview

Did you find this page helpful?