R
Runpod•5mo ago
TF

[Beginner] How to run unsloth llama3.1 8b finetune in a pod?

I just finetuned my first model with unsloth, llama3.1 8b, and unsure how to host it on runpod for inference. Can anyone point me in the right direction on how to do it or where to read up on it?
25 Replies
Unknown User
Unknown User•5mo ago
Message Not Public
Sign In & Join Server To View
Catality
Catality•5mo ago
Im wondering about that too. Got Phi-4 fine tuned on hugging face and wanna use vllm server less. I did it set it up, but it spits out unrelated garbage
Unknown User
Unknown User•5mo ago
Message Not Public
Sign In & Join Server To View
TF
TFOP•4mo ago
Tried both and neither work properly tbh, its tough I tried also over the past 2 days with gemma 27b (fp16) and nothing seems to work for it. Textgenerationwebui on a pod seemed to work in the UI but the API never returns anything
Unknown User
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
TF
TFOP•4mo ago
In the serverless one the request stays in queue forever and never returns anything I'm trying atm with the main (non finetuned) version of gemma 27b abliterated Its weird for sure, sometimes in the textgenerationwebui it loads and responds, other times it says model type "gemma3" is not supported by transformers. The API never gives a response either way though
Unknown User
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
TF
TFOP•4mo ago
I'll do it one more time and send them to you
Unknown User
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
TF
TFOP•4mo ago
TF
TFOP•4mo ago
Here this is what it says This is the model https://huggingface.co/mlabonne/gemma-3-27b-it-abliterated
Unknown User
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
TF
TFOP•4mo ago
So not something that I can fix, right? If so I'll have to use the pod instead I asssume
Unknown User
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
TF
TFOP•4mo ago
Will report back shortly with what it shows me I'm supposed to edit the container start command to the model I actually want to use, right?
Unknown User
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
TF
TFOP•4mo ago
TF
TFOP•4mo ago
This is what happened trying to use "vllm:latest" template, I didn't even modify anything in it and just left the defaults This was the container start command: --host 0.0.0.0 --port 8000 --model meta-llama/Meta-Llama-3.1-8B-Instruct --dtype bfloat16 --enforce-eager --gpu-memory-utilization 0.95 --api-key sk-IrR7Bwxtin0haWagUnPrBgq5PurnUz86 --max-model-len 8128
Unknown User
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
TF
TFOP•4mo ago
But that key starts with hf_... right? I tried putting mine and replacing that sk-IrR7... one but it has the same error again anyway Ah as HF_TOKEN?
Unknown User
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
TF
TFOP•4mo ago
Lets go for the first time I got a response back from the API 💀 Ty for the help
Unknown User
Unknown User•4mo ago
Message Not Public
Sign In & Join Server To View
ZGENMEDIA
ZGENMEDIA•4mo ago
I wish there was a walk through to see what you guys are doingi.
TF
TFOP•4mo ago
What worked in the end went like this: -Make new pod -Choose "vllm:latest" template -Go to container start command -Replace existing model url with the one you want to use (from huggingface) -On the same window scroll down to environment variables -There will be one called "HF_TOKEN", in the field to the right of this, put your hugginface access key -Start the pod -Click on "connect" button and open the https port thing to get your api url -Ask some AI to give you a test command for command-line or elsewhere so you can test if it works (include also the api key that was specified in the container start command)

Did you find this page helpful?