Runpod•4mo ago

P-State issue

Driver Version 550.127.05 CUDA Version 12.4 Power Limit 450 W P-State P2 Driver Version 565.57.01 CUDA Version 12.7 Power Limit 450 W P-State P0 We have two serverless machines. The first one is very slow, while the second one performs at the expected speed for a standard 4090. I'm wondering—what does P-state mean? Does it affect the GPU's performance?

10 Replies

Unknown User•4mo ago

Message Not Public

zongheng1619OP•4mo ago

what can cause this? is there a way I can always let it keep at P0? and can I delete this endpoint if I notice the perf is bad? Any API can do this besides go to the dashboard and delete that node?

Unknown User•4mo ago

Message Not Public

zongheng1619OP•4mo ago

can you reference me to it? I only see how to stop/cancel the current job, not seeing directly delete endpoint

Unknown User•4mo ago

Message Not Public

zongheng1619OP•4mo ago

zongheng1619OP•4mo ago

btw this is a start time graph and avg frame time graph. the lower the better, for both case. you can easily see that the quick start machine has a lower frame time (higher fps). Those are running the exactly same task The start up time is mainly python loading those ML models.

Unknown User•4mo ago

Message Not Public

zongheng1619OP•4mo ago

it's the machine that can load my ml model quick. they are loading the ml models into gpu These are TensorRT (.trt) engine files being loaded from disk into GPU memory. And in the worst case, those loading can take up to 2 mins, and left some extremely bad frame time (200ms)

Unknown User•4mo ago

Message Not Public

Gaming

Programming

P-State issue

Did you find this page helpful?