As for this argument, that might be the case but vLLM calls into C++ for inference (as does any majo

As for this argument, that might be the case but vLLM calls into C++ for inference (as does any major framework) for most parts excluding the actual HTTP serving. llama.cpp might be faster on the HTTP side but vLLM is still very much trusted by large organizations for deploying models, as can be seen from their Github where they list organizations that support it: https://github.com/vllm-project/vllm
Was this page helpful?