P-State issue

Driver Version 550.127.05 CUDA Version 12.4 Power Limit 450 W P-State P2 Driver Version 565.57.01 CUDA Version 12.7 Power Limit 450 W P-State P0 We have two serverless machines. The first one is very slow, while the second one performs at the expected speed for a standard 4090. I'm wondering—what does P-state mean? Does it affect the GPU's performance?
10 Replies
Unknown User
Unknown User2mo ago
Message Not Public
Sign In & Join Server To View
zongheng1619
zongheng1619OP2mo ago
what can cause this? is there a way I can always let it keep at P0? and can I delete this endpoint if I notice the perf is bad? Any API can do this besides go to the dashboard and delete that node?
Unknown User
Unknown User2mo ago
Message Not Public
Sign In & Join Server To View
zongheng1619
zongheng1619OP2mo ago
can you reference me to it? I only see how to stop/cancel the current job, not seeing directly delete endpoint
Unknown User
Unknown User2mo ago
Message Not Public
Sign In & Join Server To View
zongheng1619
zongheng1619OP2mo ago
No description
zongheng1619
zongheng1619OP2mo ago
btw this is a start time graph and avg frame time graph. the lower the better, for both case. you can easily see that the quick start machine has a lower frame time (higher fps). Those are running the exactly same task The start up time is mainly python loading those ML models.
Unknown User
Unknown User2mo ago
Message Not Public
Sign In & Join Server To View
zongheng1619
zongheng1619OP2mo ago
it's the machine that can load my ml model quick. they are loading the ml models into gpu These are TensorRT (.trt) engine files being loaded from disk into GPU memory. And in the worst case, those loading can take up to 2 mins, and left some extremely bad frame time (200ms)
Unknown User
Unknown User2mo ago
Message Not Public
Sign In & Join Server To View

Did you find this page helpful?