I recently found out vosk-api was a thing and decided to give it a go, I'm using one of their example projects and a test but it keeps giving very innacurate results
The audio I attached of my friend yapping gave this result
{ "text" : "shut up why are all the all the tall or close the shop for hours"}
{ "text" : "shut up why are all the all the tall or close the shop for hours"}