Prompt caching with dynamic context
Where is the best place to supply fully dynamic context, while taking full advantage of prompt caching? (particularly with openai)
OpenAI recommends placing it "at the end" of your prompt.
Heres an approximation of our current setup:
When I test the above, every new message caches 0 tokens. (verified via stream's
onFinish callback)
When I remove the dynamism and only pass BASE_STATIC_SYSTEM_PROMPT, effectively the full token count gets cached, minus the last few tokens.
What I believe is happening is at some point the cache check is performed by openai by concatenating SYSTEM_PROMPT + TOOLS_AS_STRING + MESSAGES_ARR and since our system prompt is dynamic, its causing a cache miss every time.
Is there a better place to supply this dynamic data that wouldn't cause a cache miss? As a system message in the messages array?
Is there any way to always keep this information in context (eg: not get dropped when it falls outside of the lastMessages window), while still optimizing for prompt caching?6 Replies
📝 Created GitHub issue: https://github.com/mastra-ai/mastra/issues/10381
🔍 If you're experiencing an error, please provide a minimal reproducible example whenever possible to help us resolve it quickly.
🙏 Thank you for helping us improve Mastra!
Hey @ldp !
We call
getInstructions on the agent before passing it to the model.
You can return a SystemMessage|SystemMessage[] as instructions as well! There you could add providerOptions.
promptCacheKey
promptCacheRetentionTy so much abhi! I’ll try this out this weekend
Hey @ldp ! Just wondering if you had the chance to test the solution Abhi mentioned above? Let us know if you're running into any issue setting it up 😉
hey @roamin
yep! i commented on the linked github issue here
abhi's approach seemed to cache way more tokens (not sure i totally get why it works though 😆 )
feel free to close / mark resolved (not sure if you have that on discord)
Hey Joe! Thanks for testing it out! It works because when you're using
providerOption, you're basically explicitly telling OpenAI to cache these instructions. If/when you only provide "text", then it's OpenAI that kind of decides what they're going to "cache" 😉