How to monitor the LLM inference speed (generation token/s) with vLLM serverless endpoint? - Runpod