How do I run Qwen3 235B Q5_K_M Using vLLM
Hi,
I was wondering if there was a simple way to run Qwen3 235B Q5_K_M using vLLM on RunPod.
I have two main issue:
1) the Qwen3 235B GGUF repo contains multiple quantizations (e.g., Q6_K, Q5_K_M, Q5_0), and I don't know how to select one
2) my understanding from vLLM's documentation is that I have to combine the GGUF files before serving them
I'm new to vLLM and appreciate the help!
3 Replies
Unknown User•5mo ago
Message Not Public
Sign In & Join Server To View
This is not vllm @Nicholas and there some initial start up cost around 5-6 seconds for spinning up the OpenLLM server with Text Gen Web UI:
https://github.com/justinwlin/Oobabooga-Text-Gen-Qwen2.5-7B-Instruct-Runpod
But I was working on something similar to this recently, maybe as decent reference.
GitHub
GitHub - justinwlin/Oobabooga-Text-Gen-Qwen2.5-7B-Instruct-Runpod: ...
Oobabooga Text Gen Repository with baked in Qwen 2.5 7B Instruct - justinwlin/Oobabooga-Text-Gen-Qwen2.5-7B-Instruct-Runpod
https://github.com/justinwlin/Runpod-OpenLLM-Pod-and-Serverless
Have not tried with OpenLLM, but i wonder if that model is supported? If so, maybe this repo might be helpful too
GitHub
GitHub - justinwlin/Runpod-OpenLLM-Pod-and-Serverless: A repo for O...
A repo for OpenLLM to run pod. Contribute to justinwlin/Runpod-OpenLLM-Pod-and-Serverless development by creating an account on GitHub.