GGUF Text Model Deploymet on Serverless with Streaming Response.
I am trying to deploy Text model GGUF (https://huggingface.co/bartowski/cognitivecomputations_Dolphin-Mistral-24B-Venice-Edition-GGUF)
I tried using lamma.cpp but it's not working as expected, it's running very slow and it's starting new worker with each new requests.
how should i deploy effectively ? Thanks in advance.
I tried using lamma.cpp but it's not working as expected, it's running very slow and it's starting new worker with each new requests.
how should i deploy effectively ? Thanks in advance.