Completion-style instead of Instruct-style responses
I'm using the default hello world code on serverless: "{
"input": {
"prompt": "Hello World"
}
}" but getting a completion-style response instead of an instruct-style response, despite using Llama instruct. To clarify, the response I'm getting is: "! Welcome to my blog about London: the Great City!...". How do I change the prompt format to get instruct-style responses? Where can I find the syntax?
5 Replies
This page says to check my worker's documentation - where do I see that? https://docs.runpod.io/serverless/endpoints/send-requests
I just want to know what the syntax is to submit a conversation, like "messages": [
{"role": "user", "content": "Give me a short introduction to large language models."}
]
The syntax is exactly as you say.
The vLLM worker documentation is here
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
@3WaD Please could you give me an example of how to use the CUSTOM_CHAT_TEMPLATE, as the above doesn't work with LLama 3.1 8B Instruct - but I don't know how to use a different chat template
Nevermind I just realised how to do it - ignore me