Loading LLama 70b on vllm template serverless cant answer a simple question like "what is your name"
I am loading with 1 worker and 2 GPU's 80g
But the model just cant performance at all, it gives gibrish answers for simple prompts like "what is your name"
24 Replies
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
I am just setting bfloat16 the rest i leave blank/default.
When i load with web-ui, getting completely different responses.
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Is it llama instruct? i think i was told there was a difference between llama 70b and instruct
Instruct is more like an actual chat, respond and answer
while the llama 70b is like some weird completion thing. i had also gotten gibberish answers in the past
making me move to just using openllm
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Oh
Lol 👁️
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Haha maybe im wrong and to use chat model
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Oof I dont remember. let me see if i can find my old post on this where i also asked about gibberish coming out of vllm
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
It’s just another framework to run llm models easily - i prefer to runpod’s vllm solution which i just dont prefer. some reason couldn’t ever get the vllm to work nicely / easily as openllm i felt
https://github.com/justinwlin/Runpod-OpenLLM-Pod-and-Serverless
https://github.com/bentoml/OpenLLM
GitHub
GitHub - justinwlin/Runpod-OpenLLM-Pod-and-Serverless: A repo for O...
A repo for OpenLLM to run pod. Contribute to justinwlin/Runpod-OpenLLM-Pod-and-Serverless development by creating an account on GitHub.
GitHub
GitHub - bentoml/OpenLLM: Run any open-source LLMs, such as Llama 2...
Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud. - bentoml/OpenLLM
and also i could get openllm to work vs ollama which requires a whole background server etc
and couldn’t ever get ollama to preload models properly
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
https://discord.com/channels/912829806415085598/1208252068238860329/1209740324004429844
oh oops. my previous question was around mistral being dumb 😅
Yeah! It pretty good. I have the docker images up for mistral7b, and obvs the repo. I didnt realize how big 70b models are xD and left it running on depot and came up with stupidly above 100gb images lmao
Which basically is unusable
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Thxfully depot gave me free caches 🙏
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
xD i dont wanna wait an hour for a single serverless to load 😂
what are subs?
Oh yea depot usually cost money
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
But they gave me a sponsored account
So i use it for free lol
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Im using the instruct version. Just feels like its x10 quantized like the model is very stupid.
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View