LLM inference on serverless solution
Hi, need some suggestion on serving LLM model on serverless. I have several questions:
1. Is there any guide or example project I can follow so that can infer effectively on runpod serverless?
2. Is it recommended to use frameworks like TGI or vLLM with runpod? If so why? I'd like maximum control on the inference code so I have not tried any of those frameworks
Thanks!
1. Is there any guide or example project I can follow so that can infer effectively on runpod serverless?
2. Is it recommended to use frameworks like TGI or vLLM with runpod? If so why? I'd like maximum control on the inference code so I have not tried any of those frameworks
Thanks!

