Fixing CPU bottlenecks on the EPYC 9354
Tips first; redeploying can sort out the box and the hypervisor issue for a short period of time. Hardware concerns, noisy neighbors, it's all out there to understand and get out of a runaway/sluggish pod. Thanks runpod for the infrastructure handling that so seamlessly. I even did up a whole patch (2daysrate) when the answer was right in front of me just redeploy (15-20m)
Still, what can we talk about for fixing this Specific chip always being the squeaky wheel? I don't know if the rest of users can get into helping while crediting the team. We're a varied bunch 😎 .
It's a hiccup that I wouldn't want to open up given the steady fixes and workarounds already automatically (I assume painstakingly playing whack-a-whole) but any guidance could be stellar.
1 Reply
my thought was CPU was not that much of a concern because from what I know runpod sets a CPU and memory quota with docker
so we get guarenteed CPU time
noisy neighbors in the network was more of a concern for me