I’m trying to disable reasoning in serverless vLLM with GPT-OSS-20B for streaming use cases. I don’t want any reasoning content in the responses, and I don’t need this feature at all.
I’ve tried using environment variables, but without success.
I also tried forking the RunPod vLLM repository and modifying src/handler.py and src/engine.py, but that didn’t work either.
I am stucked... Has anyone managed to disable reasoning in serverless mode? Maybe some git repo? Thank you in advance.
Recent Announcements
Continue the conversation
Join the Discord to ask follow-up questions and connect with the community
R
Runpod
We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!