Runpod•7mo ago

Length of output of serverless meta-llama/Llama-3.1-8B-Instruct

When I submit a request I get a response that is always 100 tokens. "max_tokens" or "max_new_tokens" have no effect. How do I control the number of output tokens? input: { "input": { "prompt": "Give a pancake recipe" }, "max_tokens": 5000, "temperature": 1 } output: { "delayTime": 1048, "executionTime": 2593, "id": "c444e5bb-aeca-4489-baf3-22bbe848b48c-e1", "output": [ { "choices": [ { "tokens": [ " that is made with apples and cinnamon, and also includes a detailed outline of instructions that can be make it.\nTODAY'S PANCAKE RECIPE\n\n"A wonderful breakfast or brunch food that's made with apples and cinnamon."\n\nINGREDIENTS\n4 large flour\n2 teaspoons baking powder\n1/4 teaspoon cinnamon\n1/2 teaspoon salt\n1/4 cup granulated sugar\n1 cup milk\n2 large eggs\n1 tablespoon unsalted butter, melted\n1 large apple," ] } ], "usage": { "input": 6, "output": 100 } } ], "status": "COMPLETED", "workerId": "l0efghtlo64wf5" }

Solution:

Message Not Public

Jump to solution

1 Reply

Unknown User•7mo ago

Message Not Public

Gaming

Programming

Length of output of serverless meta-llama/Llama-3.1-8B-Instruct

Did you find this page helpful?