I asked this at Cloudflare Community Forum, but got nothing.
I’m thinking about using AI worker MeloTTS for text to audio generation. But a quick testing reveals that there are ton of things that doesn’t quite look good:
nothing besides English language is working. Spanish, French - doesn’t even generate, returns error. Chinese, Japanese, Korean - returns audio but it’s a gibberish.
this model usage have cost assigned, but I don’t see it returned anywhere in the calls. Not in the response, not in the AI gateway logs.
Shouldn’t there be a speed parameter?
The cloudflare docs says that it could return one of: “string”-“The generated audio in MP3 format, base64-encoded”, “binary”-“The generated audio in MP3 format” I see it returns base64 string, how to get binary? Also, it seems, the returned audio is not mp3 but wav.