Runpod•5mo ago

[Beginner] How to run unsloth llama3.1 8b finetune in a pod?

I just finetuned my first model with unsloth, llama3.1 8b, and unsure how to host it on runpod for inference. Can anyone point me in the right direction on how to do it or where to read up on it?

25 Replies

Unknown User•5mo ago

Message Not Public

Catality•5mo ago

Im wondering about that too. Got Phi-4 fine tuned on hugging face and wanna use vllm server less. I did it set it up, but it spits out unrelated garbage

Unknown User•5mo ago

Message Not Public

TFOP•4mo ago

Tried both and neither work properly tbh, its tough I tried also over the past 2 days with gemma 27b (fp16) and nothing seems to work for it. Textgenerationwebui on a pod seemed to work in the UI but the API never returns anything

Unknown User•4mo ago

Message Not Public

TFOP•4mo ago

In the serverless one the request stays in queue forever and never returns anything I'm trying atm with the main (non finetuned) version of gemma 27b abliterated Its weird for sure, sometimes in the textgenerationwebui it loads and responds, other times it says model type "gemma3" is not supported by transformers. The API never gives a response either way though

Unknown User•4mo ago

Message Not Public

TFOP•4mo ago

I'll do it one more time and send them to you

Unknown User•4mo ago

Message Not Public

TFOP•4mo ago

logs-vLLM_-fb.txt

TFOP•4mo ago

Here this is what it says This is the model https://huggingface.co/mlabonne/gemma-3-27b-it-abliterated

Unknown User•4mo ago

Message Not Public

TFOP•4mo ago

So not something that I can fix, right? If so I'll have to use the pod instead I asssume

Unknown User•4mo ago

Message Not Public

TFOP•4mo ago

Will report back shortly with what it shows me I'm supposed to edit the container start command to the model I actually want to use, right?

Unknown User•4mo ago

Message Not Public

TFOP•4mo ago

logs.txt

TFOP•4mo ago

This is what happened trying to use "vllm:latest" template, I didn't even modify anything in it and just left the defaults This was the container start command: --host 0.0.0.0 --port 8000 --model meta-llama/Meta-Llama-3.1-8B-Instruct --dtype bfloat16 --enforce-eager --gpu-memory-utilization 0.95 --api-key sk-IrR7Bwxtin0haWagUnPrBgq5PurnUz86 --max-model-len 8128

Unknown User•4mo ago

Message Not Public

TFOP•4mo ago

But that key starts with hf_... right? I tried putting mine and replacing that sk-IrR7... one but it has the same error again anyway Ah as HF_TOKEN?

Unknown User•4mo ago

Message Not Public

TFOP•4mo ago

Lets go for the first time I got a response back from the API 💀 Ty for the help

Unknown User•4mo ago

Message Not Public

ZGENMEDIA•4mo ago

I wish there was a walk through to see what you guys are doingi.

TFOP•4mo ago

What worked in the end went like this: -Make new pod -Choose "vllm:latest" template -Go to container start command -Replace existing model url with the one you want to use (from huggingface) -On the same window scroll down to environment variables -There will be one called "HF_TOKEN", in the field to the right of this, put your hugginface access key -Start the pod -Click on "connect" button and open the https port thing to get your api url -Ask some AI to give you a test command for command-line or elsewhere so you can test if it works (include also the api key that was specified in the container start command)

Gaming

Programming

[Beginner] How to run unsloth llama3.1 8b finetune in a pod?

Did you find this page helpful?