JM
JM
RRunPod
Created by bghira on 5/8/2025 in #⛅|pods-clusters
very slow 5090 pod
To add to Hailong's point, if access to an entire 9655 CPU is expected, given there are 2 of those CPU per 8 GPU system, 4 GPU have to be rented. Hope this helps.
77 replies
RRunPod
Created by bghira on 5/8/2025 in #⛅|pods-clusters
very slow 5090 pod
AMD 9555 is actually way faster and more powerfull than the reference architecture suggested by Nvidia with the B200 DGX pods. Personally, I am a big fan of the 9575F, or 9655 and 9755 for CPU heavy workloads. That said, AMD 9555 in dual configuration is a serious system, and the latest gen available for HPC purposes.
77 replies
RRunPod
Created by Jas on 5/21/2024 in #⛅|pods-clusters
"The port is not up yet"
Nope, not even a single GPU card! Nice milestone^^ 🔥
94 replies
RRunPod
Created by Jas on 5/21/2024 in #⛅|pods-clusters
"The port is not up yet"
@digigoblin - Correction: I said we are moving towards being 12.1+ for all GPU, it's not fully done yet. - At the moment, we are completely done sunsetting cuda for 11.8 and older. - Currently working on sunsetting cuda 12.0 too, will take a couple weeks to finish 🙂 (we have now less than 4% of GPU on 12.0^^)
94 replies
RRunPod
Created by flowtyone on 3/17/2024 in #⚡|serverless
Didn't get response via email, trying my luck here
Sure
13 replies
RRunPod
Created by flowtyone on 3/17/2024 in #⚡|serverless
Didn't get response via email, trying my luck here
Happy to connect about all considerations you mentionned 🙂
13 replies
RRunPod
Created by flowtyone on 3/17/2024 in #⚡|serverless
Didn't get response via email, trying my luck here
hey @flowtyone
13 replies
RRunPod
Created by Dhruv Mullick on 3/4/2024 in #⛅|pods-clusters
Frequent GPU problem with H100
I believe the problem is largelly solved for H100s. We will be looking to automate the script now to expand it to all servers on RunPod. In the mean time, do not hesitate to reach out if you have any question 🙂
23 replies
RRunPod
Created by Dhruv Mullick on 3/4/2024 in #⛅|pods-clusters
Frequent GPU problem with H100
So, we got a very good detection tool in place now, but it's manual
23 replies
RRunPod
Created by Dhruv Mullick on 3/4/2024 in #⛅|pods-clusters
Frequent GPU problem with H100
@Dhruv Mullick I remembered you sir! 😉
23 replies
RRunPod
Created by Bryan on 3/9/2024 in #⛅|pods-clusters
GPU speed getting slower and slower
In my understand all 4090 servers are high quality there, but if not, we have to know which ones to solve this
14 replies
RRunPod
Created by Bryan on 3/9/2024 in #⛅|pods-clusters
GPU speed getting slower and slower
Community cloud has definitely variability. For Secure Cloud I am surprised; could you provide pod IDs of 2 GPU where you observe this?
14 replies
RRunPod
Created by Bryan on 3/9/2024 in #⛅|pods-clusters
GPU speed getting slower and slower
Hey @Bryan @kopyl
14 replies
RRunPod
Created by Dhruv Mullick on 3/4/2024 in #⛅|pods-clusters
Frequent GPU problem with H100
@Dhruv Mullick H100 PCIe have caused us lots of headaches lately. We are soon releasing a very powerful detection tool for the totality of RunPod servers, which will help us fix these non trivial issues. It seems it's always around some specific kernel version that might not be compatible even though it's supposed to be. That being said, expect a strong resolution in the near term!
23 replies
RRunPod
Created by ashleyk on 2/26/2024 in #⚡|serverless
Unacceptably high failed jobs suddenly
Yep, engineering has been helping me and Justin very hard lately; new admin features like this one always help so much! Take care sir, let me know if you need anything. Need to go to bed now
46 replies
RRunPod
Created by ashleyk on 2/26/2024 in #⚡|serverless
Unacceptably high failed jobs suddenly
@ashleyk Credited the account! Thanks for helping everyone
46 replies
RRunPod
Created by ashleyk on 2/26/2024 in #⚡|serverless
Unacceptably high failed jobs suddenly
Apologies for delay in responding
46 replies
RRunPod
Created by ashleyk on 2/26/2024 in #⚡|serverless
Unacceptably high failed jobs suddenly
Btw, I was literally buried in work, I found more hardware for everyone
46 replies
RRunPod
Created by ashleyk on 2/26/2024 in #⚡|serverless
Unacceptably high failed jobs suddenly
That's no good, thanks for explaining
46 replies
RRunPod
Created by ashleyk on 2/26/2024 in #⚡|serverless
Unacceptably high failed jobs suddenly
Uh
46 replies