""Uncaught exception | <class 'torch.OutOfMemoryError'>; CUDA out of memory. Tried to allocate 3.50 GiB. GPU 0 has a total capacity of 44.45 GiB of which 1.42 GiB is free"
basically used this model casperhansen/deepseek-r1-distill-qwen-32b-awq with vllm and runpod serverless, except lower the model max lenght to 11000 I didnt modify any others settings