How to Run Text Generation Inference on Serverless?

Hello newbie here, I want to run text generation inference by huggingface on serverless. I use this repo https://github.com/runpod-workers/worker-tgi, I build my own docker image according the readme and deploy on runpod serverless. But when i hit my API I get this error:

{
  "delayTime": 100308,
  "error": "handler: module 'runpod.serverless.modules' has no attribute 'rp_metrics' \ntraceback: Traceback (most recent call last):\n  File \"/opt/conda/lib/python3.10/site-packages/runpod/serverless/modules/rp_job.py\", line 194, in run_job_generator\n    async for output_partial in job_output:\n  File \"/handler.py\", line 75, in handler_streaming\n    runpod.serverless.modules.rp_metrics.metrics_collector.update_stream_aggregate(\nAttributeError: module 'runpod.serverless.modules' has no attribute 'rp_metrics'\n",
  "executionTime": 376,
  "id": "d5ff5d8d-acf5-40a3-8ffb-1ee5ce48f8d3-e1",
  "status": "FAILED"
}

{
  "delayTime": 100308,
  "error": "handler: module 'runpod.serverless.modules' has no attribute 'rp_metrics' \ntraceback: Traceback (most recent call last):\n  File \"/opt/conda/lib/python3.10/site-packages/runpod/serverless/modules/rp_job.py\", line 194, in run_job_generator\n    async for output_partial in job_output:\n  File \"/handler.py\", line 75, in handler_streaming\n    runpod.serverless.modules.rp_metrics.metrics_collector.update_stream_aggregate(\nAttributeError: module 'runpod.serverless.modules' has no attribute 'rp_metrics'\n",
  "executionTime": 376,
  "id": "d5ff5d8d-acf5-40a3-8ffb-1ee5ce48f8d3-e1",
  "status": "FAILED"
}

{
  "delayTime": 100308,
  "error": "handler: module 'runpod.serverless.modules' has no attribute 'rp_metrics' \ntraceback: Traceback (most recent call last):\n  File \"/opt/conda/lib/python3.10/site-packages/runpod/serverless/modules/rp_job.py\", line 194, in run_job_generator\n    async for output_partial in job_output:\n  File \"/handler.py\", line 75, in handler_streaming\n    runpod.serverless.modules.rp_metrics.metrics_collector.update_stream_aggregate(\nAttributeError: module 'runpod.serverless.modules' has no attribute 'rp_metrics'\n",
  "executionTime": 376,
  "id": "d5ff5d8d-acf5-40a3-8ffb-1ee5ce48f8d3-e1",
  "status": "FAILED"
}

{
  "delayTime": 100308,
  "error": "handler: module 'runpod.serverless.modules' has no attribute 'rp_metrics' \ntraceback: Traceback (most recent call last):\n  File \"/opt/conda/lib/python3.10/site-packages/runpod/serverless/modules/rp_job.py\", line 194, in run_job_generator\n    async for output_partial in job_output:\n  File \"/handler.py\", line 75, in handler_streaming\n    runpod.serverless.modules.rp_metrics.metrics_collector.update_stream_aggregate(\nAttributeError: module 'runpod.serverless.modules' has no attribute 'rp_metrics'\n",
  "executionTime": 376,
  "id": "d5ff5d8d-acf5-40a3-8ffb-1ee5ce48f8d3-e1",
  "status": "FAILED"
}

can anyone help me?

ashley•3/25/24, 8:50 AM

Most people use this one:
https://github.com/runpod-workers/worker-vllm

GitHub

GitHub - runpod-workers/worker-vllm: The RunPod worker template for...

The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm

Aashley Most people use this one: https://github.com/runpod-workers/worker-vllm

Oryza sativaOP•3/25/24, 9:51 AM

is it also support text generation inference?

OOryza sativa is it also support text generation inference?

ashley•3/25/24, 9:52 AM

Yes

Aashley Most people use this one: https://github.com/runpod-workers/worker-vllm

Oryza sativaOP•3/26/24, 3:55 AM

hello sorry for late response, I tried to use prebuild docker from this repo the config looks like this. but still no response after hit my api

OOryza sativa hello sorry for late response, I tried to use prebuild docker from this repo the...

ashley•3/26/24, 12:17 PM

What response do you get when calling your endpoint?

ashley•3/26/24, 12:17 PM

@Alpay Ariyak may be able to advise.

Alpay Ariyak•3/26/24, 5:38 PM

@Oryza sativa Can you share the worker logs

Oryza sativaOP•3/27/24, 2:45 AM

I am sorry, already get the response, I think becuse I hit the endpoint but my endpoint still on initializing status, isn ot ready yet. and finally get the response with this. thank you @ashleyk

-d '{"input": {"prompt": "What is Deeplearning?", "sampling_params": {"max_tokens": 100, "n": 1, "presence_penalty": 0.2, "frequency_penalty": 0.7, "temperature": 0.3}}} '

-d '{"input": {"prompt": "What is Deeplearning?", "sampling_params": {"max_tokens": 100, "n": 1, "presence_penalty": 0.2, "frequency_penalty": 0.7, "temperature": 0.3}}} '

Oryza sativaOP•3/27/24, 2:46 AM

but i just curious, it is using vllm right? so is runpod now support using TGI for deploying model in serverless?

https://github.com/huggingface/text-generation-inference

GitHub

GitHub - huggingface/text-generation-inference: Large Language Mode...

Large Language Model Text Generation Inference. Contribute to huggingface/text-generation-inference development by creating an account on GitHub.

How to Run Text Generation Inference on Serverless?

Similar Threads

Similar Threads

Similar Threads