Theo's Typesafe Cult•5mo ago

Patterns for streaming AI responses to client without leaking API keys

When using services like Fal.ai or OpenRouter for big AI models, these providers usually implement streaming (which is awesome) - but it seems like a lot of effort and hand-rolling is still required to make streaming work without exposing API keys to the client. This is especially annoying with serverless backends like Next.js + Vercel. To get LLM streaming working without client-side API keys, I've been using a pattern where one API route starts the stream and another route handles polling for updates. So I'll have an endpoint called something like startStream which does something like:

const streamId = uuidv4()
// Start background streaming (fire and forget)
const backgroundPromise = processStreamInBackground(streamId, input);
// On Vercel, ensure background task continues after response
globalThis.waitUntil(backgroundPromise);
return { streamId };

const streamId = uuidv4()
// Start background streaming (fire and forget)
const backgroundPromise = processStreamInBackground(streamId, input);
// On Vercel, ensure background task continues after response
globalThis.waitUntil(backgroundPromise);
return { streamId };

The processStreamInBackground function stores streaming chunks in Redis (keyed by streamId), while, concurrently, the client polls a separate getStreamData endpoint that reads from Redis and returns partial data + completion status. This works reasonably well, but feels like it should be a common enough problem that there'd be a more "off-the-shelf" solution. Am I overcomplicating this? How do y'all handle streaming without exposing API keys? Thanks for your help 🙏

0 Replies

No replies yetBe the first to reply to this messageJoin

Gaming

Programming

Patterns for streaming AI responses to client without leaking API keys

Did you find this page helpful?