RunPod•15mo ago

llama.cpp serverless endpoint

https://github.com/ggerganov/llama.cpp
llama.cpp is afak the only setup that supports llava-1.6 quantized, that's why i use it. On some workers the docker image works, on others "illegal instruction" error and crash. https://github.com/ggerganov/llama.cpp/issues/537 I wonder if someone already tried it out and if there's a better fix to this issue other than building and stuffing multiple binaries with the correct instruction sets into one image that will work anywhere. (i already tried building with LLAMA_NATIVE=0) appreciate any insights, thanks!

GitHub

GitHub - ggerganov/llama.cpp: Port of Facebook's LLaMA model in C/C++

Port of Facebook's LLaMA model in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.

GitHub

Docker Issus ''Illegal instruction'' · Issue #537 · ggerganov/llama...

I try to make it run the docker version on Unraid, I run this as post Arguments: --run -m /models/7B/ggml-model-q4_0.bin -p "This is a test" -n 512 I got this error: /app/.devops/tools.sh...

Solution:

I don't know why you would want to use llama.cpp, its more for offloading onto CPU than for GPU. You can look at using this instead: https://github.com/ashleykleynhans/runpod-worker-llava...

GitHub

GitHub - ashleykleynhans/runpod-worker-llava: LLaVA: Large Language...

LLaVA: Large Language and Vision Assistant | RunPod Serverless Worker - ashleykleynhans/runpod-worker-llava

Jump to solution

3 Replies

Solution

ashleyk•15mo ago

I don't know why you would want to use llama.cpp, its more for offloading onto CPU than for GPU. You can look at using this instead: https://github.com/ashleykleynhans/runpod-worker-llava

GitHub

GitHub - ashleykleynhans/runpod-worker-llava: LLaVA: Large Language...

LLaVA: Large Language and Vision Assistant | RunPod Serverless Worker - ashleykleynhans/runpod-worker-llava

ashleyk•15mo ago

If your really want to use llama.cpp, keep using the Github, its clearly an issue with llama.cpp so don't know why you're posting it here, it is 100% not a RunPod issue. You even posted a link to an issue on their Github, so it really makes no sense whatsoever as to why you would create a post for support on RunPod. RunPod is an infrastructure provider, not here to help you with bugs in the applications you're using, Github should be used for that. Imagine AWS etc trying to help every user with every bug in every application they want to run, its simply not feasible. Thats what Github issues are there for.

pazanchickOP•15mo ago

ye was wondering if anyone had exp with this already, thx llama.cpp supports quantized model, haotian-liu/LLaVA does not yet afak. 34B unquantized is just too big

Gaming

Programming

llama.cpp serverless endpoint

Did you find this page helpful?