Patterns for streaming AI responses to client without leaking API keys

When using services like Fal.ai or OpenRouter for big AI models, these providers usually implement streaming (which is awesome) - but it seems like a lot of effort and hand-rolling is still required to make streaming work without exposing API keys to the client. This is especially annoying with serverless backends like Next.js + Vercel. To get LLM streaming working without client-side API keys, I've been using a pattern where one API route starts the stream and another route handles polling for updates. So I'll have an endpoint called something like startStream which does something like:
const streamId = uuidv4()
// Start background streaming (fire and forget)
const backgroundPromise = processStreamInBackground(streamId, input);
// On Vercel, ensure background task continues after response
globalThis.waitUntil(backgroundPromise);
return { streamId };
const streamId = uuidv4()
// Start background streaming (fire and forget)
const backgroundPromise = processStreamInBackground(streamId, input);
// On Vercel, ensure background task continues after response
globalThis.waitUntil(backgroundPromise);
return { streamId };
The processStreamInBackground function stores streaming chunks in Redis (keyed by streamId), while, concurrently, the client polls a separate getStreamData endpoint that reads from Redis and returns partial data + completion status. This works reasonably well, but feels like it should be a common enough problem that there'd be a more "off-the-shelf" solution. Am I overcomplicating this? How do y'all handle streaming without exposing API keys? Thanks for your help 🙏
0 Replies
No replies yetBe the first to reply to this messageJoin

Did you find this page helpful?