R
Runpodβ€’15mo ago
Encyrption

Monitor GPU VRAM - Which GPU to check?

I am trying to monitor the GPU VRAM usage in serverless worker. To do this with pynvml I need to provide the index of the GPU. Is there a way I can obtain the index of the GPU my worker is using? I did not see this info in the ENV variables. I do see RUNPOD_GPU_COUNT but not sure if that helps. Seems that RunPod is monitoring cpu, gpu stats as they present that information in their web interface. Does the RunPod python module expose those stats, without having to code our own? Below is a code snippet that reports VRAM usage in a %.
import pynvml
import time

# Initialize NVML
pynvml.nvmlInit()

handle = pynvml.nvmlDeviceGetHandleByIndex(0) # Assuming you have only one GPU

while True:
# Get the memory information for the GPU
memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)

used_vram = memory_info.used // (1024 ** 2) # Convert bytes to MB
total_vram = memory_info.total // (1024 ** 2) # Convert bytes to MB
vram_usage_percentage = round((used_vram / total_vram) * 100)

print(f'vram usage: {vram_usage_percentage}%')

time.sleep(5)
import pynvml
import time

# Initialize NVML
pynvml.nvmlInit()

handle = pynvml.nvmlDeviceGetHandleByIndex(0) # Assuming you have only one GPU

while True:
# Get the memory information for the GPU
memory_info = pynvml.nvmlDeviceGetMemoryInfo(handle)

used_vram = memory_info.used // (1024 ** 2) # Convert bytes to MB
total_vram = memory_info.total // (1024 ** 2) # Convert bytes to MB
vram_usage_percentage = round((used_vram / total_vram) * 100)

print(f'vram usage: {vram_usage_percentage}%')

time.sleep(5)
Thanks! πŸ™‚
16 Replies
Encyrption
EncyrptionOPβ€’15mo ago
Maybe I could use GraphQL with PodTelemetry? Where's my GraphQL experts at? πŸ˜‰
Unknown User
Unknown Userβ€’15mo ago
Message Not Public
Sign In & Join Server To View
Encyrption
EncyrptionOPβ€’15mo ago
If I assume that my worker is using gpu at index 0. If there are multiple GPU in the server that might not be accurate. I might be on GPU 3 and another worker using GPU 0. I am pretty sure I can get that info with GraphQL. I should be able to query by pod ID and it has PodTelemetry in the return, which contains cpu and gpu stats. I'm just struggling with the documentation for it.
Unknown User
Unknown Userβ€’15mo ago
Message Not Public
Sign In & Join Server To View
Encyrption
EncyrptionOPβ€’15mo ago
Yeah, I've seen that. I'm still looking for a good example of making a graphql request.
Unknown User
Unknown Userβ€’15mo ago
Message Not Public
Sign In & Join Server To View
Encyrption
EncyrptionOPβ€’15mo ago
I would need to provide the pod id
Unknown User
Unknown Userβ€’15mo ago
Message Not Public
Sign In & Join Server To View
Encyrption
EncyrptionOPβ€’15mo ago
So what do I do? add podId: ${pod_id} to inupt?
Unknown User
Unknown Userβ€’15mo ago
Message Not Public
Sign In & Join Server To View
Encyrption
EncyrptionOPβ€’15mo ago
That's great, thanks! I was going to send that data over the web socket but this is much better. I can just have the browser call this once a second and update CPU/GPU graph. πŸ™‚
Unknown User
Unknown Userβ€’15mo ago
Message Not Public
Sign In & Join Server To View
Encyrption
EncyrptionOPβ€’15mo ago
Yeah, I think It is really coming along. Everything works just need to update the CPU/GPU graph and display the result media.
No description
Unknown User
Unknown Userβ€’15mo ago
Message Not Public
Sign In & Join Server To View
Encyrption
EncyrptionOPβ€’15mo ago
ToonCrafter is just one in the market... I will likely try and add a lot of models before going live. My code builds the interface dynamically so should be able to add them pretty fast.
Encyrption
EncyrptionOPβ€’15mo ago
No description

Did you find this page helpful?