RunpodR
Runpod2y ago
J.

VllM Memory Error / Runpod Error?

https://pastebin.com/vjSgS4up

Error initializing vLLM engine: The model's max seq len (32768) is larger than the maximum number of tokens that can be stored in KV cache (24144). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine.


I get this error when I tried to start my vllm mistral serverless, it ended up fixing itself by just increasing the GPU to 24GB GPU Pro; which made me guess the GPU just wasn't good enough (even though it was my CPU indicating a 100% usage).

But I guess the problem I have is how do I stop it from erroring out and repeating infinitely if it happens again? Does runpod or VLLM is it possible to catch this somehow?

(The pastebin shows it worked eventually, cause that was a log from my second request after I upgraded the GPU, but otherwise it just kept going for a bit till i manually killed it)
Pastebin
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
Was this page helpful?