Soulmind
RRunPod
•Created by sluzorz on 8/23/2024 in #⛅|pods-clusters
Maximum number of A40s that can run at one time
yeah I will do, cuz I've been monitoring the values for a while, and seems like the
totalCount
, and rentedCount
for same GPU but different dc shows the same value:
84 replies
RRunPod
•Created by sluzorz on 8/23/2024 in #⛅|pods-clusters
Maximum number of A40s that can run at one time
👍 the only thing is, it seems like the GraphQL API is responding with the combined # of GPUs, not the # of GPUs in the specific dc...
84 replies
RRunPod
•Created by sluzorz on 8/23/2024 in #⛅|pods-clusters
Maximum number of A40s that can run at one time
There is way to do that if you use GraphQL. The doc states that there is
totalCount
and rentedCount
.
If you run the query:
with variable:
you will be able to see the rented count and total count:
but seems like the rented count and total count is not strictly from that specific datacenter, but aggregated tho..84 replies
RRunPod
•Created by sluzorz on 8/23/2024 in #⛅|pods-clusters
Maximum number of A40s that can run at one time
We're pooling from CA-MTL-1 and EU-SE-1, as they are the only datacenters with network volume support with A40s.
84 replies
RRunPod
•Created by sluzorz on 8/23/2024 in #⛅|pods-clusters
Maximum number of A40s that can run at one time
@yhlong00000 any plans on adding more A40s to the pool?
84 replies
RRunPod
•Created by sluzorz on 8/23/2024 in #⛅|pods-clusters
Maximum number of A40s that can run at one time
hope it's easy to debug! seems like now there are ~121 GPUs available.
btw which backend are you using for your batch job? I heard SGLang is pretty good for batch jobs.
84 replies
RRunPod
•Created by sluzorz on 8/23/2024 in #⛅|pods-clusters
Maximum number of A40s that can run at one time
and 🤞 for your batch job 😉
84 replies
RRunPod
•Created by sluzorz on 8/23/2024 in #⛅|pods-clusters
Maximum number of A40s that can run at one time
lol yeah for sure it won't be an issue of course, not your fault no need to say sorry!
it's just our thing that we only have added A40 to the autoscaling pool for now, cuz seemed like there were plenty of A40s couple days/weeks back.
I think we anyways need to add more GPU types to the pool to adapt to any case.
84 replies
RRunPod
•Created by sluzorz on 8/23/2024 in #⛅|pods-clusters
Maximum number of A40s that can run at one time
I got paged by the alert policy we setup internally for A40 availability as our product currently relies on that.
Of course it is your right to spin up as many as you want but can you kindly let me know if this is going to be a one-off thingie or something you will be running in long-term? We've been happily enjoying the high availability of A40 but there are now only ~27 GPUs left lol
84 replies