Runpod•5mo ago

How do I run Qwen3 235B Q5_K_M Using vLLM

Hi, I was wondering if there was a simple way to run Qwen3 235B Q5_K_M using vLLM on RunPod. I have two main issue: 1) the Qwen3 235B GGUF repo contains multiple quantizations (e.g., Q6_K, Q5_K_M, Q5_0), and I don't know how to select one 2) my understanding from vLLM's documentation is that I have to combine the GGUF files before serving them I'm new to vLLM and appreciate the help!

Qwen/Qwen3-235B-A22B-GGUF · Hugging Face

3 Replies

Unknown User•5mo ago

Message Not Public

J.•5mo ago

This is not vllm @Nicholas and there some initial start up cost around 5-6 seconds for spinning up the OpenLLM server with Text Gen Web UI: https://github.com/justinwlin/Oobabooga-Text-Gen-Qwen2.5-7B-Instruct-Runpod But I was working on something similar to this recently, maybe as decent reference.

GitHub

GitHub - justinwlin/Oobabooga-Text-Gen-Qwen2.5-7B-Instruct-Runpod: ...

Oobabooga Text Gen Repository with baked in Qwen 2.5 7B Instruct - justinwlin/Oobabooga-Text-Gen-Qwen2.5-7B-Instruct-Runpod

J.•5mo ago

https://github.com/justinwlin/Runpod-OpenLLM-Pod-and-Serverless Have not tried with OpenLLM, but i wonder if that model is supported? If so, maybe this repo might be helpful too

GitHub

GitHub - justinwlin/Runpod-OpenLLM-Pod-and-Serverless: A repo for O...

A repo for OpenLLM to run pod. Contribute to justinwlin/Runpod-OpenLLM-Pod-and-Serverless development by creating an account on GitHub.

Gaming

Programming

How do I run Qwen3 235B Q5_K_M Using vLLM

Did you find this page helpful?