Runpod•10mo ago

How to monitor the LLM inference speed (generation token/s) with vLLM serverless endpoint?

I have got started with vLLM deployment and the configuration with my application is straightforward and that woerked as well. My main concern is how to monitor the speed of inference on the dashboard or on the "metrics" tab? Because, currently, I have to look manually in the logs and find the average token generation speed spit by vLLM. Any neat solution to this??

5 Replies

Unknown User•10mo ago

Message Not Public

jackson holeOP•10mo ago

Oh yeah, I thought runpod has built-in support for this. Thanks

Unknown User•10mo ago

Message Not Public

jackson holeOP•10mo ago

Absolutely, but I found Discord (and nerdylive support) faster and quicker 😉

Unknown User•10mo ago

Message Not Public

Gaming

Programming

How to monitor the LLM inference speed (generation token/s) with vLLM serverless endpoint?

Did you find this page helpful?