M
Mastra2mo ago
csallen

Multi-modal tool results (reading images from tool result message)

I'm using a tool to allow the LLM to generate images. However, the problem I'm running into is that the LLM cannot "see" the image from the tool result. (I've tried with both OpenAI and Anthropic models.) For example, if I tell it to generate an image, and it calls my tool and does so, and then I ask it a question about the image, it cannot actually see the image and answer my question accurately. This seems important, because using a tool seems to be the number one recommended way to have LLMs generate images in a chat. I think there is a specific way these messages need to be structured for the LLM to be able to see the image generate by a tool message. I know that when you send an image in a user message, Mastra will structure the right message format for the LLM to see it. But I can't find any guidance in the Mastra documentation about how to do this when generating an image from a tool message. So I'm not sure how my tool response is supposed to look. The only clue I have is this: https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling#multi-modal-tool-results
Tool Calling
Learn about tool calling and multi-step calls (using stopWhen) with AI SDK Core.
2 Replies
Mastra Triager
Mastra Triager2mo ago
GitHub
[DISCORD:1433505827947286658] Multi-modal tool results (reading ima...
This issue was created from Discord post: https://discord.com/channels/1309558646228779139/1433505827947286658 I'm using a tool to allow the LLM to generate images. However, the problem I'm...
_roamin_
_roamin_2mo ago
Hi @csallen ! How does the tool result look like? You might be able to use an output processor to convert the tool result into a message the LLM can process

Did you find this page helpful?