Help with Whisper.net (AI Voice detection)
I'am trying to make a substitle generator using whisper and ffmpeg. But i noticed that when nobody is talking (in the video), the subtitle shows too early... Is there a way using Whisper or anything else to fix this? Thanks. Watch the video to understand well, even if its not in english : https://streamable.com/tkzeah
27 Replies
Here is the code :
Unknown User•5d ago
Message Not Public
Sign In & Join Server To View
i dont understand, whats wrong?
Unknown User•5d ago
Message Not Public
Sign In & Join Server To View
yes
but this isn't my video i got it from a friend
Unknown User•5d ago
Message Not Public
Sign In & Join Server To View
💀
bro there is nothing wrong but if you want i remove it....
and also if you know something about the issue help me please
Unknown User•5d ago
Message Not Public
Sign In & Join Server To View
"if she makes a mistake she owes you a blowjob"
Unknown User•5d ago
Message Not Public
Sign In & Join Server To View
i dont see something wrong about it. Personally i just take it as a joke so
My friend said it was for a tiktok
Unknown User•5d ago
Message Not Public
Sign In & Join Server To View
so uh do you know how to fix
Unknown User•5d ago
Message Not Public
Sign In & Join Server To View
dude i swear im not doing anything illegal or smth bad
Unknown User•5d ago
Message Not Public
Sign In & Join Server To View
this isnt my video
its just you dont understand french
the video is just a quizz...
Unknown User•5d ago
Message Not Public
Sign In & Join Server To View
with subtitles that show too early..
please.
bro
i swear there is nothing wrong or any sexual things
The actual technique is not to depend only on what whisper provides. The correct timeline should be constructed based on different layers of data you collect from many other approaches, and human review/editing is still required if you want high quality results in the end.
I see. But is there still any ways to make the results better?
to put the subtitles in time and not too early
Yes, there are better results from AI/algorithms. My team have an in-house enterprise solution developed in this field, but unfortunately even with that level of details AI/algorithms can fail in edge cases.
i have a question, do you think it may be because of the model? The used model does around 800MB, if i get a larger one, will it be more precise and dont put subtitles too early?
It would be a trade-off when you compare local small model to cloud based commercial services (and their modern models behind). However, the actual raw materials vary, and none of them is perfect right now to handle all cases (pause with noises, music, etc.), so you need to expect certain amount of human editing like I mentioned early on. We are also developing our own editing tools to further minimize human errors/efforts during the process.
I have a question about you video editor app. Are you gonna develop it using ffmpeg or another library?
It's a commercial product for our internal use right now, so not able to share much more details. You can use whatever technique feasible, as there are just too many options.
I'm just asking because im curious how popular editing app were build like capcut
It would be very hard to do a such thing from scratch without library