GenkitG
Genkit12mo ago
10 replies
military-pink

Hi again!

Hi again!

In the case of simple user messages, the Gemini model decides to request a tool right away and only starts generating a text response after receiving the tool's response. In this case, I see that the generated streaming chunks and the final complete response are the same.

However, here is a more complex example: I asked the AI to write a short story and then answer a question that required calling a tool. From the streaming chunks, I noticed that first, the model starts generating the short story while simultaneously calling the tool. But after receiving the tool's response, I observe one of two possible outcomes:

1) The following streaming chunks (after the story) and the final complete response only contain the answer to the question that required calling the tool (so the story is missing).
2) The following streaming chunks (after the story) and the final complete response contain a completely different story along with the answer to the tool-related question.

In my web app, I want to show users the streaming text along with tool requests in between (so they can see what's happening). Right now, as soon as the complete response arrives, I replace the model's message from the streaming chunks with this final response. However, this approach is unreliable because the final response may not contain the initial part of the generated text before the tool response arrives. In other words, the sum of all previously streamed text chunks does not match the final complete response.

What would you suggest? Does Firebase Genkit support generating text and calling tools in parallel in such cases? Or could this be an issue with the Gemini 1.5 Pro model? Or perhaps I need to handle something differently in my code?

Thank you in advance! I truly appreciate your work—Genkit has been incredibly helpful for building AI applications.
Was this page helpful?