ktabrizi
RRunPod
•Created by ktabrizi on 8/30/2024 in #⛅|pods-clusters
Two pods disappeared from my account
For anyone investigating something similar, it turns out RunPod has a stale volume deletion policy:
Stale volumes are deleted if they have been inactive for 30 days, or if you run out of funds.Pre-deletion warning notifications were sent to our team admin (I just never saw them, whoops).
7 replies
RRunPod
•Created by ktabrizi on 8/30/2024 in #⛅|pods-clusters
Two pods disappeared from my account
No, they were "On-Demand - Secure Cloud" pods (edited my post to include this info).
7 replies
RRunPod
•Created by ktabrizi on 7/9/2024 in #⛅|pods-clusters
AMD pods don't properly support GPU memory allocation
Sounds good, thanks for the update. If there's any way be notified if/when this is supported, please let me know!
10 replies
RRunPod
•Created by ktabrizi on 7/9/2024 in #⛅|pods-clusters
AMD pods don't properly support GPU memory allocation
definitely fair, though I imagine there's a slightly more permissive security profile that will allow these pinned memory allocations without dropping seccomp altogether.
10 replies
RRunPod
•Created by ktabrizi on 7/9/2024 in #⛅|pods-clusters
AMD pods don't properly support GPU memory allocation
we do – our application is compute intensive and involves PyTorch, but isn't an LLM or diffusion model. I think as soon as the software involved is doing anything custom with ROCm/HIP, someone would hit these kinds of issues. It'd be great to be able to run with RunPod's AMD pods as more and more applications are built to take advantage of the MI300Xs.
10 replies
RRunPod
•Created by ktabrizi on 7/9/2024 in #⛅|pods-clusters
AMD pods don't properly support GPU memory allocation
Here's my script for quickly testing this, in case anyone wants to reproduce it:
You can compile and run this with
hipcc -o test_hip_malloc test_hip_malloc.cpp && ./test_hip_malloc
.10 replies