R
Runpod15mo ago
sluzorz

Maximum number of A40s that can run at one time

I'm looking to run as many A40s to finish a large-scale inference/LLM generation job. How many could I run at one time? 40, 80, 100?
51 Replies
PM
PM15mo ago
In practice, many setups use between 2 to 8 GPUs, but some high-performance computing environments may use even more, depending on the specific needs and configuration of the system.
sluzorz
sluzorzOP15mo ago
We split our inference jobs into batches So we can run on any amount of GPUs as we like, but just need someone from RunPod to confirm this is allowed.
PM
PM15mo ago
it sounds interesting I am familiar with RunPod I wanna know how larget is your inference job
sluzorz
sluzorzOP15mo ago
200 million rows can run maybe... 120,000 rows per hour per instance
PM
PM15mo ago
i think so but i need to check could u gimme runpod account?
sluzorz
sluzorzOP15mo ago
?
PM
PM15mo ago
i mean paid account
sluzorz
sluzorzOP15mo ago
Just to be clear, you don't work for runpod and all I need is an answer from runpod that I can run 100x A40s so I'm not gonna give you access to my runpod acocunt lol
PM
PM15mo ago
i can understand u
Unknown User
Unknown User15mo ago
Message Not Public
Sign In & Join Server To View
sluzorz
sluzorzOP15mo ago
yup
yhlong00000
yhlong0000015mo ago
We have a good amount of A40 GPUs available. In the ticket let us know the time you usually need them, the duration you plan to run them, and whether this is just for your current project or a longer-term, ongoing need. This information will help us better plan our capacity.😄
sluzorz
sluzorzOP15mo ago
Ah, thanks! We just spun up 100, seems like may have consumed all of them, lol.
Soulmind
Soulmind15mo ago
I got paged by the alert policy we setup internally for A40 availability as our product currently relies on that. Of course it is your right to spin up as many as you want but can you kindly let me know if this is going to be a one-off thingie or something you will be running in long-term? We've been happily enjoying the high availability of A40 but there are now only ~27 GPUs left lol
sluzorz
sluzorzOP15mo ago
Our job should finish in an hour. Sorry about that! Support told me it was okay haha I’m happy to ping y’all ahead of time
Soulmind
Soulmind15mo ago
lol yeah for sure it won't be an issue of course, not your fault no need to say sorry! it's just our thing that we only have added A40 to the autoscaling pool for now, cuz seemed like there were plenty of A40s couple days/weeks back. I think we anyways need to add more GPU types to the pool to adapt to any case. and 🤞 for your batch job 😉
sluzorz
sluzorzOP15mo ago
There’s a bug so! Back to figuring it out and spinning up tomorrow
Soulmind
Soulmind15mo ago
hope it's easy to debug! seems like now there are ~121 GPUs available. btw which backend are you using for your batch job? I heard SGLang is pretty good for batch jobs. @yhlong00000 any plans on adding more A40s to the pool?
yhlong00000
yhlong0000015mo ago
I’m sorry, I don’t have specific details about the future plans, but I know that we’re continuously working with suppliers to add more based on demand. The more you use a particular one, the more likely we are to expand it.😀
sluzorz
sluzorzOP15mo ago
We're just splitting our database into chunks, downloads, processes, then uploads when complete
yhlong00000
yhlong0000015mo ago
BTW, are you in the same region? It might be worth checking availability in other regions as well.😆
yhlong00000
yhlong0000015mo ago
No description
sluzorz
sluzorzOP15mo ago
I just spun down our job so released 100 GPUs back @yhlong00000 is there a way to see quantity of GPUs? rather than just high/low
yhlong00000
yhlong0000015mo ago
For customers, it’s not available. Let me check if there’s a specific reason why we don’t display it. Will get back to you later.
Soulmind
Soulmind15mo ago
We're pooling from CA-MTL-1 and EU-SE-1, as they are the only datacenters with network volume support with A40s. There is way to do that if you use GraphQL. The doc states that there is totalCount and rentedCount. If you run the query:
query gpuAvailability($gpuTypesInput: GpuTypeFilter, $lowestPriceInput: GpuLowestPriceInput) {
gpuTypes(input: $gpuTypesInput) {
lowestPrice(input: $lowestPriceInput) {
uninterruptablePrice
rentalPercentage
rentedCount
totalCount
}
}
}
query gpuAvailability($gpuTypesInput: GpuTypeFilter, $lowestPriceInput: GpuLowestPriceInput) {
gpuTypes(input: $gpuTypesInput) {
lowestPrice(input: $lowestPriceInput) {
uninterruptablePrice
rentalPercentage
rentedCount
totalCount
}
}
}
with variable:
variables: {
gpuTypesInput: {
id: 'NVIDIA A40',
},
lowestPriceInput: {
gpuCount: 1,
secureCloud: true,
dataCenterId: 'CA-MTL-1',
},
}
variables: {
gpuTypesInput: {
id: 'NVIDIA A40',
},
lowestPriceInput: {
gpuCount: 1,
secureCloud: true,
dataCenterId: 'CA-MTL-1',
},
}
you will be able to see the rented count and total count:
{
"data": {
"gpuTypes": [
{
"lowestPrice": {
"uninterruptablePrice": 0.35,
"rentalPercentage": 0.8745,
"rentedCount": 885,
"totalCount": 1012
}
}
]
}
}
{
"data": {
"gpuTypes": [
{
"lowestPrice": {
"uninterruptablePrice": 0.35,
"rentalPercentage": 0.8745,
"rentedCount": 885,
"totalCount": 1012
}
}
]
}
}
but seems like the rented count and total count is not strictly from that specific datacenter, but aggregated tho..
sluzorz
sluzorzOP15mo ago
Yeah, cool. I can work off that
yhlong00000
yhlong0000015mo ago
😂 ok, you guys are smart than me
Unknown User
Unknown User15mo ago
Message Not Public
Sign In & Join Server To View
Soulmind
Soulmind15mo ago
👍 the only thing is, it seems like the GraphQL API is responding with the combined # of GPUs, not the # of GPUs in the specific dc...
Unknown User
Unknown User15mo ago
Message Not Public
Sign In & Join Server To View
Soulmind
Soulmind15mo ago
yeah I will do, cuz I've been monitoring the values for a while, and seems like the totalCount, and rentedCount for same GPU but different dc shows the same value:
Datacenter: CA-MTL-1
GPU Types: NVIDIA A40
{
"data": {
"gpuTypes": [
{
"lowestPrice": {
"uninterruptablePrice": 0.7,
"rentalPercentage": 0.8423,
"rentedCount": 844,
"totalCount": 1002,
"stockStatus": "High"
},
"oneMonthPrice": 0.35,
"threeMonthPrice": 0.35,
"sixMonthPrice": null
}
]
}
}
Datacenter: EU-SE-1
GPU Types: NVIDIA A40
{
"data": {
"gpuTypes": [
{
"lowestPrice": {
"uninterruptablePrice": 0.7,
"rentalPercentage": 0.8423,
"rentedCount": 844,
"totalCount": 1002,
"stockStatus": "Medium"
},
"oneMonthPrice": 0.35,
"threeMonthPrice": 0.35,
"sixMonthPrice": null
}
]
}
}
Datacenter: CA-MTL-1
GPU Types: NVIDIA A40
{
"data": {
"gpuTypes": [
{
"lowestPrice": {
"uninterruptablePrice": 0.7,
"rentalPercentage": 0.8423,
"rentedCount": 844,
"totalCount": 1002,
"stockStatus": "High"
},
"oneMonthPrice": 0.35,
"threeMonthPrice": 0.35,
"sixMonthPrice": null
}
]
}
}
Datacenter: EU-SE-1
GPU Types: NVIDIA A40
{
"data": {
"gpuTypes": [
{
"lowestPrice": {
"uninterruptablePrice": 0.7,
"rentalPercentage": 0.8423,
"rentedCount": 844,
"totalCount": 1002,
"stockStatus": "Medium"
},
"oneMonthPrice": 0.35,
"threeMonthPrice": 0.35,
"sixMonthPrice": null
}
]
}
}
sluzorz
sluzorzOP15mo ago
Starting that batch job again We might take all the A40 capacity or the remainder of it
utmostmick0
utmostmick015mo ago
how long u guys gonna be running for ?
sluzorz
sluzorzOP15mo ago
3h ~3h
utmostmick0
utmostmick015mo ago
okies is this gonna b an ongoing thing ?
sluzorz
sluzorzOP15mo ago
yes
utmostmick0
utmostmick015mo ago
ok
sluzorz
sluzorzOP15mo ago
but mostly 1-2 times per week
utmostmick0
utmostmick015mo ago
all good dude , i specificly set my workflow up in eu because when i get to use it no one is using them lol
sluzorz
sluzorzOP15mo ago
are you using spot? I feel like it's releasing spot instances right now lol
Flynn
Flynn15mo ago
@sluzorz I see you're using up all the A40s! Do you know if there's a way to transfer all my data from one pod to another? I'm happy using another gpu, but I have a lot of stuff downloaded to my current pod which is on A40
sluzorz
sluzorzOP15mo ago
Cloud sync and rclone I think their cloud sync is just rclone
Unknown User
Unknown User15mo ago
Message Not Public
Sign In & Join Server To View
sluzorz
sluzorzOP15mo ago
Large batch inference jobs with Bart
Unknown User
Unknown User15mo ago
Message Not Public
Sign In & Join Server To View
sluzorz
sluzorzOP15mo ago
Yeah, we’ve let runpod know.
Flynn
Flynn15mo ago
@sluzorz will you be finished about now?
sluzorz
sluzorzOP15mo ago
Some of our batches are finishing now But we still have about 30 remaining since we couldn't spin up 100 A40s
Unknown User
Unknown User15mo ago
Message Not Public
Sign In & Join Server To View
Flynn
Flynn15mo ago
okay please let me know when you've finished
sluzorz
sluzorzOP15mo ago
We're mostly done But we will probably consume more in a few hours for embedding The A40s in runpod as just too good of an offering

Did you find this page helpful?