How to Run Text Generation Inference on Serverless?

Hello newbie here, I want to run text generation inference by huggingface on serverless. I use this repo https://github.com/runpod-workers/worker-tgi, I build my own docker image according the readme and deploy on runpod serverless. But when i hit my API I get this error:
{
"delayTime": 100308,
"error": "handler: module 'runpod.serverless.modules' has no attribute 'rp_metrics' \ntraceback: Traceback (most recent call last):\n File \"/opt/conda/lib/python3.10/site-packages/runpod/serverless/modules/rp_job.py\", line 194, in run_job_generator\n async for output_partial in job_output:\n File \"/handler.py\", line 75, in handler_streaming\n runpod.serverless.modules.rp_metrics.metrics_collector.update_stream_aggregate(\nAttributeError: module 'runpod.serverless.modules' has no attribute 'rp_metrics'\n",
"executionTime": 376,
"id": "d5ff5d8d-acf5-40a3-8ffb-1ee5ce48f8d3-e1",
"status": "FAILED"
}
{
"delayTime": 100308,
"error": "handler: module 'runpod.serverless.modules' has no attribute 'rp_metrics' \ntraceback: Traceback (most recent call last):\n File \"/opt/conda/lib/python3.10/site-packages/runpod/serverless/modules/rp_job.py\", line 194, in run_job_generator\n async for output_partial in job_output:\n File \"/handler.py\", line 75, in handler_streaming\n runpod.serverless.modules.rp_metrics.metrics_collector.update_stream_aggregate(\nAttributeError: module 'runpod.serverless.modules' has no attribute 'rp_metrics'\n",
"executionTime": 376,
"id": "d5ff5d8d-acf5-40a3-8ffb-1ee5ce48f8d3-e1",
"status": "FAILED"
}
can anyone help me?
8 Replies
ashleyk
ashleyk3mo ago
GitHub
GitHub - runpod-workers/worker-vllm: The RunPod worker template for...
The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm
Oryza sativa
Oryza sativa3mo ago
is it also support text generation inference?
ashleyk
ashleyk3mo ago
Yes
Oryza sativa
Oryza sativa3mo ago
hello sorry for late response, I tried to use prebuild docker from this repo the config looks like this. but still no response after hit my api
No description
No description
ashleyk
ashleyk3mo ago
What response do you get when calling your endpoint? @Alpay Ariyak may be able to advise.
Alpay Ariyak
Alpay Ariyak3mo ago
@Oryza sativa Can you share the worker logs
Oryza sativa
Oryza sativa3mo ago
I am sorry, already get the response, I think becuse I hit the endpoint but my endpoint still on initializing status, isn ot ready yet. and finally get the response with this. thank you @ashleyk
-d '{"input": {"prompt": "What is Deeplearning?", "sampling_params": {"max_tokens": 100, "n": 1, "presence_penalty": 0.2, "frequency_penalty": 0.7, "temperature": 0.3}}} '
-d '{"input": {"prompt": "What is Deeplearning?", "sampling_params": {"max_tokens": 100, "n": 1, "presence_penalty": 0.2, "frequency_penalty": 0.7, "temperature": 0.3}}} '
Oryza sativa
Oryza sativa3mo ago
but i just curious, it is using vllm right? so is runpod now support using TGI for deploying model in serverless? https://github.com/huggingface/text-generation-inference
GitHub
GitHub - huggingface/text-generation-inference: Large Language Mode...
Large Language Model Text Generation Inference. Contribute to huggingface/text-generation-inference development by creating an account on GitHub.