Patterns for streaming AI responses to client without leaking API keys
When using services like Fal.ai or OpenRouter for big AI models, these providers usually implement streaming (which is awesome) - but it seems like a lot of effort and hand-rolling is still required to make streaming work without exposing API keys to the client. This is especially annoying with serverless backends like Next.js + Vercel.
To get LLM streaming working without client-side API keys, I've been using a pattern where one API route starts the stream and another route handles polling for updates.
So I'll have an endpoint called something like
startStream
which does something like:
The processStreamInBackground
function stores streaming chunks in Redis (keyed by streamId), while, concurrently, the client polls a separate getStreamData
endpoint that reads from Redis and returns partial data + completion status.
This works reasonably well, but feels like it should be a common enough problem that there'd be a more "off-the-shelf" solution. Am I overcomplicating this? How do y'all handle streaming without exposing API keys?
Thanks for your help 🙏0 Replies