Serverless Endpoint Streaming
I'm currently working with Llama.cpp for my inference and have setup my handler.py file to be similar to this guide.
https://docs.runpod.io/docs/handler-generator
My input and handler file looks like this:
My problem is that whenever I am testing this out in the requests tab on the dashboard, it keeps saying stream is empty.
https://github.com/runpod-workers/worker-vllm
https://docs.runpod.io/docs/handler-generator
My input and handler file looks like this:
My problem is that whenever I am testing this out in the requests tab on the dashboard, it keeps saying stream is empty.
https://github.com/runpod-workers/worker-vllm
GitHub
The RunPod worker template for serving our large language model endpoints. Powered by VLLM. - GitHub - runpod-workers/worker-vllm: The RunPod worker template for serving our large language model en...
