Runpod•7mo ago

[Beginner] How to run unsloth llama3.1 8b finetune in a pod?

I just finetuned my first model with unsloth, llama3.1 8b, and unsure how to host it on runpod for inference. Can anyone point me in the right direction on how to do it or where to read up on it?

TTF I just finetuned my first model with unsloth, llama3.1 8b, and unsure how to hos...

Jason•6/21/25, 9:58 AM

You have it on huggingface already?

Jason•6/21/25, 9:58 AM

want to use vllm serverless or pod?

Catality•6/22/25, 12:51 PM

Im wondering about that too. Got Phi-4 fine tuned on hugging face and wanna use vllm server less. I did it set it up, but it spits out unrelated garbage

Jason•6/22/25, 12:54 PM

Try the normal phi4 too if it spits out unrelated things

JJason want to use vllm serverless or pod?

TFOP•6/25/25, 1:47 PM

Tried both and neither work properly tbh, its tough

TFOP•6/25/25, 1:48 PM

I tried also over the past 2 days with gemma 27b (fp16) and nothing seems to work for it. Textgenerationwebui on a pod seemed to work in the UI but the API never returns anything

TTF Tried both and neither work properly tbh, its tough

Jason•6/25/25, 1:48 PM

What's wrong??

JJason Try the normal phi4 too if it spits out unrelated things

Jason•6/25/25, 1:48 PM

Have you tried this too

JJason What's wrong??

TFOP•6/25/25, 1:49 PM

In the serverless one the request stays in queue forever and never returns anything

JJason Have you tried this too

TFOP•6/25/25, 1:49 PM

I'm trying atm with the main (non finetuned) version of gemma 27b abliterated

TFOP•6/25/25, 1:50 PM

Its weird for sure, sometimes in the textgenerationwebui it loads and responds, other times it says model type "gemma3" is not supported by transformers. The API never gives a response either way though

Jason•6/25/25, 1:53 PM

Did you check the logs

TTF In the serverless one the request stays in queue forever and never returns anyth...

Jason•6/25/25, 1:53 PM

This* logs

TFOP•6/25/25, 1:56 PM

I'll do it one more time and send them to you

Jason•6/25/25, 1:57 PM

Okay thanks

Jason•6/25/25, 1:58 PM

I'll maybe try some models later on vllm serverless

TFOP•6/25/25, 1:59 PM

logs-vLLM_-fb.txt3.8KB

JJason Okay thanks

TFOP•6/25/25, 1:59 PM

Here this is what it says

TFOP•6/25/25, 1:59 PM

This is the model https://huggingface.co/mlabonne/gemma-3-27b-it-abliterated

Jason•6/25/25, 2:16 PM

Okay that vllm template has the transformer version without that architecture, so needs runpod to update

JJason Okay that vllm template has the transformer version without that architecture, s...

TFOP•6/25/25, 2:18 PM

So not something that I can fix, right? If so I'll have to use the pod instead I asssume

Jason•6/25/25, 2:18 PM

Yes you can use pod, vllm's docker image

JJason Yes you can use pod, vllm's docker image

Jason•6/25/25, 2:18 PM

If the vllm already supports it

TFOP•6/25/25, 2:18 PM

Will report back shortly with what it shows me

TFOP•6/25/25, 2:20 PM

I'm supposed to edit the container start command to the model I actually want to use, right?

Jason•6/25/25, 2:20 PM

Yes

Jason•6/25/25, 2:21 PM

It'll download it to the pod storage

Jason•6/25/25, 2:21 PM

Container storage most likely

TFOP•6/25/25, 2:28 PM

logs.txt6.62KB

TFOP•6/25/25, 2:28 PM

This is what happened trying to use "vllm:latest" template, I didn't even modify anything in it and just left the defaults

TFOP•6/25/25, 2:29 PM

This was the container start command:
--host 0.0.0.0 --port 8000 --model meta-llama/Meta-Llama-3.1-8B-Instruct --dtype bfloat16 --enforce-eager --gpu-memory-utilization 0.95 --api-key sk-IrR7Bwxtin0haWagUnPrBgq5PurnUz86 --max-model-len 8128

Jason•6/25/25, 2:31 PM

You need hf api key set in the env

JJason You need hf api key set in the env

TFOP•6/25/25, 2:37 PM

But that key starts with hf_... right? I tried putting mine and replacing that sk-IrR7... one but it has the same error again anyway

TFOP•6/25/25, 2:38 PM

Ah as HF_TOKEN?

Jason•6/25/25, 2:38 PM

No, set it as env variable, look at how to set hf key in env variable, it's not api key

Jason•6/25/25, 2:38 PM

Yes hf token

TFOP•6/25/25, 2:52 PM

Lets go for the first time I got a response back from the API

JJason Yes hf token

TFOP•6/25/25, 2:52 PM

Ty for the help

Jason•6/25/25, 3:22 PM

haha finally, no problem too! if you want help just ask here with the details like logs, error

TTF Lets go for the first time I got a response back from the API 💀

ZGENMEDIA•6/25/25, 4:32 PM

I wish there was a walk through to see what you guys are doingi.

ZZGENMEDIA I wish there was a walk through to see what you guys are doingi.

TFOP•6/25/25, 4:37 PM

What worked in the end went like this:

-Make new pod
-Choose "vllm:latest" template
-Go to container start command
-Replace existing model url with the one you want to use (from huggingface)
-On the same window scroll down to environment variables
-There will be one called "HF_TOKEN", in the field to the right of this, put your hugginface access key
-Start the pod
-Click on "connect" button and open the https port thing to get your api url
-Ask some AI to give you a test command for command-line or elsewhere so you can test if it works (include also the api key that was specified in the container start command)

[Beginner] How to run unsloth llama3.1 8b finetune in a pod?

Similar Threads

Similar Threads

Similar Threads