Prompt caching with dynamic context

Agent

Where is the best place to supply fully dynamic context, while taking full advantage of prompt caching? (particularly with openai)

OpenAI recommends placing it "at the end" of your prompt.

Heres an approximation of our current setup:

export const chatAgent: Agent = new Agent({
  name: "chat-agent",
  tools: { }, // very large static toolset
  instructions: ({ runtimeContext }) => {
   // an example of some truly dynamic var, differs every req
   const currentTime = Date.now(); 
   
   return `${BASE_STATIC_SYSTEM_PROMPT} 

   - the user's name is ${runtimeContext.get('username')}
   - the current time is ${currentTime}`;
  },
});

export const chatAgent: Agent = new Agent({
  name: "chat-agent",
  tools: { }, // very large static toolset
  instructions: ({ runtimeContext }) => {
   // an example of some truly dynamic var, differs every req
   const currentTime = Date.now(); 
   
   return `${BASE_STATIC_SYSTEM_PROMPT} 

   - the user's name is ${runtimeContext.get('username')}
   - the current time is ${currentTime}`;
  },
});

When I test the above, every new message caches 0 tokens. (verified via stream's

onFinish

onFinish

callback)

When I remove the dynamism and only pass

BASE_STATIC_SYSTEM_PROMPT

BASE_STATIC_SYSTEM_PROMPT

, effectively the full token count gets cached, minus the last few tokens.

What I believe is happening is at some point the cache check is performed by openai by concatenating

SYSTEM_PROMPT + TOOLS_AS_STRING + MESSAGES_ARR

SYSTEM_PROMPT + TOOLS_AS_STRING + MESSAGES_ARR

and since our system prompt is dynamic, its causing a cache miss every time.

Is there a better place to supply this dynamic data that wouldn't cause a cache miss? As a system message in the messages array?

Is there any way to always keep this information in context (eg: not get dropped when it falls outside of the

lastMessages

lastMessages

window), while still optimizing for prompt caching?

Prompt caching with dynamic context

Agent

export const chatAgent: Agent = new Agent({
  name: "chat-agent",
  tools: { }, // very large static toolset
  instructions: ({ runtimeContext }) => {
   // an example of some truly dynamic var, differs every req
   const currentTime = Date.now(); 
   
   return `${BASE_STATIC_SYSTEM_PROMPT} 

   - the user's name is ${runtimeContext.get('username')}
   - the current time is ${currentTime}`;
  },
});

export const chatAgent: Agent = new Agent({
  name: "chat-agent",
  tools: { }, // very large static toolset
  instructions: ({ runtimeContext }) => {
   // an example of some truly dynamic var, differs every req
   const currentTime = Date.now(); 
   
   return `${BASE_STATIC_SYSTEM_PROMPT} 

   - the user's name is ${runtimeContext.get('username')}
   - the current time is ${currentTime}`;
  },
});

When I test the above, every new message caches 0 tokens. (verified via stream's

onFinish

onFinish

callback)

When I remove the dynamism and only pass

BASE_STATIC_SYSTEM_PROMPT

BASE_STATIC_SYSTEM_PROMPT

, effectively the full token count gets cached, minus the last few tokens.

What I believe is happening is at some point the cache check is performed by openai by concatenating

SYSTEM_PROMPT + TOOLS_AS_STRING + MESSAGES_ARR

SYSTEM_PROMPT + TOOLS_AS_STRING + MESSAGES_ARR

lastMessages

lastMessages

window), while still optimizing for prompt caching?

Prompt caching with dynamic context

Prompt caching with dynamic context

Similar Threads

Similar Threads

Similar Threads