© 2026 Hedgehog Software, LLC

Twitter GitHub Discord

More

Communities Docs About Terms Privacy

GGUF Text Model Deploymet on Serverless with Streaming Response. - Runpod

Runpod•2mo ago•

6 replies

GGUF Text Model Deploymet on Serverless with Streaming Response.

I am trying to deploy Text model GGUF (https://huggingface.co/bartowski/cognitivecomputations_Dolphin-Mistral-24B-Venice-Edition-GGUF)
I tried using lamma.cpp but it's not working as expected, it's running very slow and it's starting new worker with each new requests.

how should i deploy effectively ? Thanks in advance.

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

21,202Members

Resources

Recent Announcements

Similar Threads

Was this page helpful?

Similar Threads

GGUF in serverless vLLM

RRunpod / ⚡｜serverless

Strange model response in Serverless

RRunpod / ⚡｜serverless

Serverless Streaming Documentation

RRunpod / ⚡｜serverless

Serverless Endpoint Streaming

RRunpod / ⚡｜serverless