Assignment Problem
I am working with two third-party services: Pyannote AI for Speaker Diarization and Whisper for Speech-To-Text.
The reason I am using Pyannote is because Whisper does not innitially support Speaker Diarization (for what I am informed about).
So what I need to do, is match the Speakers from the Diarization response of Pyannote with the segments from the Transcription response from Whisper. Since both responses provide time parameters (start and stop), I am thinking to rely on them...
Here are some example responses (real case): https://pastebin.com/JDjTQf9s
Does any of you have any experience with similar "assignment" problems, or does any of you have any suggestion on what's the best way to approach this sort of problem...
The reason I am using Pyannote is because Whisper does not innitially support Speaker Diarization (for what I am informed about).
So what I need to do, is match the Speakers from the Diarization response of Pyannote with the segments from the Transcription response from Whisper. Since both responses provide time parameters (start and stop), I am thinking to rely on them...
Here are some example responses (real case): https://pastebin.com/JDjTQf9s
Does any of you have any experience with similar "assignment" problems, or does any of you have any suggestion on what's the best way to approach this sort of problem...
Pastebin
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.