R
RunPod•4mo ago
B1llstar

Trying to deploy Llava-Mistral using a simple Docker image, receive both success & error msgs

I am using a simple Docker script to deploy Llava-Mistral. In the system logs, it creates the container successfully. In the container logs, I get the following:
2024-02-05T01:52:10.452447184Z [FATAL tini (7)] exec docker failed: No such file or directory
2024-02-05T01:52:10.452447184Z [FATAL tini (7)] exec docker failed: No such file or directory
Script:
# Use an official Ubuntu as a base image
FROM nvidia/cuda:11.8.0-base-ubuntu20.04

# Set noninteractive environment variable to avoid prompts during package installations
ENV DEBIAN_FRONTEND=noninteractive

# Update and install git-lfs, cmake, and other required packages
RUN apt-get update && \
apt-get install -y git-lfs python3 python3-pip cmake g++ gcc

# Install additional dependencies for server mode
RUN CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python[server]

# Create a directory for the llava files
WORKDIR /llava

# Download specific files from the repository
ADD https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/llava-v1.6-mistral-7b.Q4_K_M.gguf /llava/
ADD https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/mmproj-model-f16.gguf /llava/

# Run the server with specified parameters
CMD python3 -m llama_cpp.server --model /llava/llava-v1.6-mistral-7b.Q4_K_M.gguf --clip_model_path /llava/mmproj-model-f16.gguf --port 8081 --host 0.0.0.0 --n_gpu_layers -1 --use_mlock false
# Use an official Ubuntu as a base image
FROM nvidia/cuda:11.8.0-base-ubuntu20.04

# Set noninteractive environment variable to avoid prompts during package installations
ENV DEBIAN_FRONTEND=noninteractive

# Update and install git-lfs, cmake, and other required packages
RUN apt-get update && \
apt-get install -y git-lfs python3 python3-pip cmake g++ gcc

# Install additional dependencies for server mode
RUN CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python[server]

# Create a directory for the llava files
WORKDIR /llava

# Download specific files from the repository
ADD https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/llava-v1.6-mistral-7b.Q4_K_M.gguf /llava/
ADD https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/mmproj-model-f16.gguf /llava/

# Run the server with specified parameters
CMD python3 -m llama_cpp.server --model /llava/llava-v1.6-mistral-7b.Q4_K_M.gguf --clip_model_path /llava/mmproj-model-f16.gguf --port 8081 --host 0.0.0.0 --n_gpu_layers -1 --use_mlock false
The system logs spam me with "start container" as well. I made sure to use absolute paths to make certain that this is pointed at the right spot. I also tested this in Docker Desktop and it worked flawlessly. My question is what am I doing wrong here? Why am I unable to get a connection to the endpoint? I'd also like to know what a typical request would look like to an exposed port on the https /run endpoint. Typically reverse proxies don't use ports so I'd like to know what the norm is for that.
22 Replies
ashleyk
ashleyk•4mo ago
You need to use sleep infinity to keep your container alive. I also have a LLaVA template that you can use that is working.
B1llstar
B1llstar•4mo ago
yes that'd be awesome. any tips on getting mistral in particular working? was i on the right track with my container or?
ashleyk
ashleyk•4mo ago
Mistral 7B is the default model in my green template
B1llstar
B1llstar•4mo ago
so where is the template regular or instruct
ashleyk
ashleyk•4mo ago
"LLaVA 1.6" under the "Communtiy" section of "Explore".
B1llstar
B1llstar•4mo ago
im trying to implement this on serverless though this is the serverless support section
ashleyk
ashleyk•4mo ago
I don't see a runpod hander in your Dockerfile
B1llstar
B1llstar•4mo ago
i'm new to runpod can you show me what i need to change?
B1llstar
B1llstar•4mo ago
well you just told me what was missing, so how about just telling me directly so i don't have to sift through all that
ashleyk
ashleyk•4mo ago
It is not my job to hold your hand and do everything for you. I told you your handler was missing, use your brain and follow the resources I sent you otherwise I will gladly help you for $100 per hour of my time.
B1llstar
B1llstar•4mo ago
i asked for a courtesy, you respond with sass? you said in the article yourself you aren't an expert with implementing llava. your time is not worth $100 per hour
ashleyk
ashleyk•4mo ago
Then stuggle with it yourself
B1llstar
B1llstar•4mo ago
you are a childish man
ashleyk
ashleyk•4mo ago
Nope, I told you want to do but you are too lazy and expect everyone to do everything for you. That is not how life works. I offered to help for my hourly rate then you insult me, when I am one of the most experienced people on RunPod. YOU are chilidhs and a comple fucking idiot.
B1llstar
B1llstar•4mo ago
i don't know a single noteworthy person who yells their credentials when somebody upsets them imagine going into a help section and calling someone the r slur
Madiator2011
Madiator2011•4mo ago
@ashleyk Lets chill and let me handle this. No need to make another argue 🙂 @B1llstar have you tried to put
python3 -m llama_cpp.server --model /llava/llava-v1.6-mistral-7b.Q4_K_M.gguf --clip_model_path /llava/mmproj-model-f16.gguf --port 8081 --host 0.0.0.0 --n_gpu_layers -1 --use_mlock false
python3 -m llama_cpp.server --model /llava/llava-v1.6-mistral-7b.Q4_K_M.gguf --clip_model_path /llava/mmproj-model-f16.gguf --port 8081 --host 0.0.0.0 --n_gpu_layers -1 --use_mlock false
as docker command. Also do you use network/volume storage? I would also change way you store models in image:
# Use an official NVIDIA CUDA image as a base
FROM nvidia/cuda:11.8.0-base-ubuntu20.04

# Set noninteractive environment variable
ENV DEBIAN_FRONTEND=noninteractive

# Update, install necessary packages, and clean up in a single RUN to reduce image size
RUN apt-get update && \
apt-get install -y git-lfs python3 python3-pip cmake g++ gcc && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

# Install additional Python dependencies
RUN CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python[server]

# Set work directory
WORKDIR /llava

# If direct ADD does not work due to authentication or redirection issues, replace with:
RUN apt-get update && apt-get install -y curl && \
curl -o llava-v1.6-mistral-7b.Q4_K_M.gguf https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/llava-v1.6-mistral-7b.Q4_K_M.gguf && \
curl -o mmproj-model-f16.gguf https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/mmproj-model-f16.gguf

# Command to run the server
CMD ["python3", "-m", "llama_cpp.server", "--model", "/llava/llava-v1.6-mistral-7b.Q4_K_M.gguf", "--clip_model_path", "/llava/mmproj-model-f16.gguf", "--port", "8081", "--host", "0.0.0.0", "--n_gpu_layers", "-1", "--use_mlock", "false"]
# Use an official NVIDIA CUDA image as a base
FROM nvidia/cuda:11.8.0-base-ubuntu20.04

# Set noninteractive environment variable
ENV DEBIAN_FRONTEND=noninteractive

# Update, install necessary packages, and clean up in a single RUN to reduce image size
RUN apt-get update && \
apt-get install -y git-lfs python3 python3-pip cmake g++ gcc && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

# Install additional Python dependencies
RUN CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python[server]

# Set work directory
WORKDIR /llava

# If direct ADD does not work due to authentication or redirection issues, replace with:
RUN apt-get update && apt-get install -y curl && \
curl -o llava-v1.6-mistral-7b.Q4_K_M.gguf https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/llava-v1.6-mistral-7b.Q4_K_M.gguf && \
curl -o mmproj-model-f16.gguf https://huggingface.co/cjpais/llava-1.6-mistral-7b-gguf/resolve/main/mmproj-model-f16.gguf

# Command to run the server
CMD ["python3", "-m", "llama_cpp.server", "--model", "/llava/llava-v1.6-mistral-7b.Q4_K_M.gguf", "--clip_model_path", "/llava/mmproj-model-f16.gguf", "--port", "8081", "--host", "0.0.0.0", "--n_gpu_layers", "-1", "--use_mlock", "false"]
though note it will work on pods as for serverlles you need have handler file that will process job requests
B1llstar
B1llstar•4mo ago
nice, i will look at this today. thank you for the level-headed response
Madiator2011
Madiator2011•4mo ago
Though like askleyk said have look at the links he send they are good examples how to start with serverless
B1llstar
B1llstar•4mo ago
i don't know if that guy represents you but it's probably not a good idea to have someone yelling obscenities like that
Madiator2011
Madiator2011•4mo ago
ashleyk is person that creates many templates and he is always willing to help. Though not except that we are ChatGPT and we will give you working solution cause you want to.
B1llstar
B1llstar•4mo ago
i didn't quite understand that second sentence, but i think i understand what you're getting at? i honestly mainly asked for direct help because i figured the fix was a single line of code or something that i was missing in the docker file lol i'll be looking into the hander today though. i did glance at the articles and they were well-written. i can "separate the artist from their work"