Streaming + Tool Loop + Output Formats

i'm thinking through some revamping of how streaming works so that it can fully support both custom format parsing and be able to stream the full tool loop. i think the chunk interface might need to look like:

interface GenerateResponseChunk<T> {
  index: number; // index of the message in the tool loop, starts at 0 and increments. this used to be the candidate index...
  role: "model" | "tool"; // it will always be one of these two
  content: Part[];
  get text(): string;
  get output(): T: // use the format parser to construct the output for this chunk
}

interface GenerateResponseChunk<T> {
  index: number; // index of the message in the tool loop, starts at 0 and increments. this used to be the candidate index...
  role: "model" | "tool"; // it will always be one of these two
  content: Part[];
  get text(): string;
  get output(): T: // use the format parser to construct the output for this chunk
}

interface GenerateResponseChunk<T> {
  index: number; // index of the message in the tool loop, starts at 0 and increments. this used to be the candidate index...
  role: "model" | "tool"; // it will always be one of these two
  content: Part[];
  get text(): string;
  get output(): T: // use the format parser to construct the output for this chunk
}

interface GenerateResponseChunk<T> {
  index: number; // index of the message in the tool loop, starts at 0 and increments. this used to be the candidate index...
  role: "model" | "tool"; // it will always be one of these two
  content: Part[];
  get text(): string;
  get output(): T: // use the format parser to construct the output for this chunk
}

so you might end up with chunks like:

{index: 0, role: "model", content: [{text: "I'm going to call a"}]}
{index: 0, role: "model", content: [{text: "tool"}, {toolRequest: {...}}]}
{index: 1, role: "tool", content: [{toolResponse: {...}}]},
{index: 2, role: "model", content: [{text: "Here's my final answer, based"}]}
{index: 2, role: "model", content: [{text: " on the tool's response"}]}

{index: 0, role: "model", content: [{text: "I'm going to call a"}]}
{index: 0, role: "model", content: [{text: "tool"}, {toolRequest: {...}}]}
{index: 1, role: "tool", content: [{toolResponse: {...}}]},
{index: 2, role: "model", content: [{text: "Here's my final answer, based"}]}
{index: 2, role: "model", content: [{text: " on the tool's response"}]}

{index: 0, role: "model", content: [{text: "I'm going to call a"}]}
{index: 0, role: "model", content: [{text: "tool"}, {toolRequest: {...}}]}
{index: 1, role: "tool", content: [{toolResponse: {...}}]},
{index: 2, role: "model", content: [{text: "Here's my final answer, based"}]}
{index: 2, role: "model", content: [{text: " on the tool's response"}]}

{index: 0, role: "model", content: [{text: "I'm going to call a"}]}
{index: 0, role: "model", content: [{text: "tool"}, {toolRequest: {...}}]}
{index: 1, role: "tool", content: [{toolResponse: {...}}]},
{index: 2, role: "model", content: [{text: "Here's my final answer, based"}]}
{index: 2, role: "model", content: [{text: " on the tool's response"}]}

does this make sense? i'm also thinking that for buffered chunking (only emit complete array items) i'm going to have it always emit 1:1 for each received chunk, but emit empty arrays of new values for chunks that don't have a complete record. it was really hard to reason about the idea that the stream would have different numbers of chunks depending on the format

Streaming + Tool Loop + Output Formats

Similar Threads

Similar Threads

Similar Threads