Runpod•15mo ago

Maximum number of A40s that can run at one time

I'm looking to run as many A40s to finish a large-scale inference/LLM generation job. How many could I run at one time? 40, 80, 100?

51 Replies

PM•15mo ago

In practice, many setups use between 2 to 8 GPUs, but some high-performance computing environments may use even more, depending on the specific needs and configuration of the system.

sluzorzOP•15mo ago

We split our inference jobs into batches So we can run on any amount of GPUs as we like, but just need someone from RunPod to confirm this is allowed.

PM•15mo ago

it sounds interesting I am familiar with RunPod I wanna know how larget is your inference job

sluzorzOP•15mo ago

200 million rows can run maybe... 120,000 rows per hour per instance

PM•15mo ago

i think so but i need to check could u gimme runpod account?

sluzorzOP•15mo ago

PM•15mo ago

i mean paid account

sluzorzOP•15mo ago

Just to be clear, you don't work for runpod and all I need is an answer from runpod that I can run 100x A40s so I'm not gonna give you access to my runpod acocunt lol

PM•15mo ago

i can understand u

Unknown User•15mo ago

Message Not Public

sluzorzOP•15mo ago

yup

yhlong00000•15mo ago

We have a good amount of A40 GPUs available. In the ticket let us know the time you usually need them, the duration you plan to run them, and whether this is just for your current project or a longer-term, ongoing need. This information will help us better plan our capacity.😄

sluzorzOP•15mo ago

Ah, thanks! We just spun up 100, seems like may have consumed all of them, lol.

Soulmind•15mo ago

I got paged by the alert policy we setup internally for A40 availability as our product currently relies on that. Of course it is your right to spin up as many as you want but can you kindly let me know if this is going to be a one-off thingie or something you will be running in long-term? We've been happily enjoying the high availability of A40 but there are now only ~27 GPUs left lol

sluzorzOP•15mo ago

Our job should finish in an hour. Sorry about that! Support told me it was okay haha I’m happy to ping y’all ahead of time

Soulmind•15mo ago

lol yeah for sure it won't be an issue of course, not your fault no need to say sorry! it's just our thing that we only have added A40 to the autoscaling pool for now, cuz seemed like there were plenty of A40s couple days/weeks back. I think we anyways need to add more GPU types to the pool to adapt to any case. and 🤞 for your batch job 😉

sluzorzOP•15mo ago

There’s a bug so! Back to figuring it out and spinning up tomorrow

Soulmind•15mo ago

hope it's easy to debug! seems like now there are ~121 GPUs available. btw which backend are you using for your batch job? I heard SGLang is pretty good for batch jobs. @yhlong00000 any plans on adding more A40s to the pool?

yhlong00000•15mo ago

I’m sorry, I don’t have specific details about the future plans, but I know that we’re continuously working with suppliers to add more based on demand. The more you use a particular one, the more likely we are to expand it.😀

sluzorzOP•15mo ago

We're just splitting our database into chunks, downloads, processes, then uploads when complete

yhlong00000•15mo ago

BTW, are you in the same region? It might be worth checking availability in other regions as well.😆

yhlong00000•15mo ago

sluzorzOP•15mo ago

I just spun down our job so released 100 GPUs back @yhlong00000 is there a way to see quantity of GPUs? rather than just high/low

yhlong00000•15mo ago

For customers, it’s not available. Let me check if there’s a specific reason why we don’t display it. Will get back to you later.

Soulmind•15mo ago

We're pooling from CA-MTL-1 and EU-SE-1, as they are the only datacenters with network volume support with A40s. There is way to do that if you use GraphQL. The doc states that there is totalCount and rentedCount. If you run the query:

query gpuAvailability($gpuTypesInput: GpuTypeFilter, $lowestPriceInput: GpuLowestPriceInput) {
  gpuTypes(input: $gpuTypesInput) {
    lowestPrice(input: $lowestPriceInput) {
      uninterruptablePrice
      rentalPercentage
      rentedCount
      totalCount
    }
  }
}

query gpuAvailability($gpuTypesInput: GpuTypeFilter, $lowestPriceInput: GpuLowestPriceInput) {
  gpuTypes(input: $gpuTypesInput) {
    lowestPrice(input: $lowestPriceInput) {
      uninterruptablePrice
      rentalPercentage
      rentedCount
      totalCount
    }
  }
}

with variable:

variables: {
  gpuTypesInput: {
    id: 'NVIDIA A40',
  },
  lowestPriceInput: {
    gpuCount: 1,
    secureCloud: true,
    dataCenterId: 'CA-MTL-1',
  },
}

variables: {
  gpuTypesInput: {
    id: 'NVIDIA A40',
  },
  lowestPriceInput: {
    gpuCount: 1,
    secureCloud: true,
    dataCenterId: 'CA-MTL-1',
  },
}

you will be able to see the rented count and total count:

{
  "data": {
    "gpuTypes": [
      {
        "lowestPrice": {
          "uninterruptablePrice": 0.35,
          "rentalPercentage": 0.8745,
          "rentedCount": 885,
          "totalCount": 1012
        }
      }
    ]
  }
}

{
  "data": {
    "gpuTypes": [
      {
        "lowestPrice": {
          "uninterruptablePrice": 0.35,
          "rentalPercentage": 0.8745,
          "rentedCount": 885,
          "totalCount": 1012
        }
      }
    ]
  }
}

but seems like the rented count and total count is not strictly from that specific datacenter, but aggregated tho..

sluzorzOP•15mo ago

Yeah, cool. I can work off that

yhlong00000•15mo ago

😂 ok, you guys are smart than me

Unknown User•15mo ago

Message Not Public

Soulmind•15mo ago

👍 the only thing is, it seems like the GraphQL API is responding with the combined # of GPUs, not the # of GPUs in the specific dc...

Unknown User•15mo ago

Message Not Public

Soulmind•15mo ago

yeah I will do, cuz I've been monitoring the values for a while, and seems like the totalCount, and rentedCount for same GPU but different dc shows the same value:

Datacenter: CA-MTL-1
GPU Types: NVIDIA A40
{
  "data": {
    "gpuTypes": [
      {
        "lowestPrice": {
          "uninterruptablePrice": 0.7,
          "rentalPercentage": 0.8423,
          "rentedCount": 844,
          "totalCount": 1002,
          "stockStatus": "High"
        },
        "oneMonthPrice": 0.35,
        "threeMonthPrice": 0.35,
        "sixMonthPrice": null
      }
    ]
  }
}
Datacenter: EU-SE-1
GPU Types: NVIDIA A40
{
  "data": {
    "gpuTypes": [
      {
        "lowestPrice": {
          "uninterruptablePrice": 0.7,
          "rentalPercentage": 0.8423,
          "rentedCount": 844,
          "totalCount": 1002,
          "stockStatus": "Medium"
        },
        "oneMonthPrice": 0.35,
        "threeMonthPrice": 0.35,
        "sixMonthPrice": null
      }
    ]
  }
}

Datacenter: CA-MTL-1
GPU Types: NVIDIA A40
{
  "data": {
    "gpuTypes": [
      {
        "lowestPrice": {
          "uninterruptablePrice": 0.7,
          "rentalPercentage": 0.8423,
          "rentedCount": 844,
          "totalCount": 1002,
          "stockStatus": "High"
        },
        "oneMonthPrice": 0.35,
        "threeMonthPrice": 0.35,
        "sixMonthPrice": null
      }
    ]
  }
}
Datacenter: EU-SE-1
GPU Types: NVIDIA A40
{
  "data": {
    "gpuTypes": [
      {
        "lowestPrice": {
          "uninterruptablePrice": 0.7,
          "rentalPercentage": 0.8423,
          "rentedCount": 844,
          "totalCount": 1002,
          "stockStatus": "Medium"
        },
        "oneMonthPrice": 0.35,
        "threeMonthPrice": 0.35,
        "sixMonthPrice": null
      }
    ]
  }
}

sluzorzOP•15mo ago

Starting that batch job again We might take all the A40 capacity or the remainder of it

utmostmick0•15mo ago

how long u guys gonna be running for ?

sluzorzOP•15mo ago

3h ~3h

utmostmick0•15mo ago

okies is this gonna b an ongoing thing ?

sluzorzOP•15mo ago

yes

utmostmick0•15mo ago

sluzorzOP•15mo ago

but mostly 1-2 times per week

utmostmick0•15mo ago

all good dude , i specificly set my workflow up in eu because when i get to use it no one is using them lol

sluzorzOP•15mo ago

are you using spot? I feel like it's releasing spot instances right now lol

Flynn•15mo ago

@sluzorz I see you're using up all the A40s! Do you know if there's a way to transfer all my data from one pod to another? I'm happy using another gpu, but I have a lot of stuff downloaded to my current pod which is on A40

sluzorzOP•15mo ago

Cloud sync and rclone I think their cloud sync is just rclone

Unknown User•15mo ago

Message Not Public

sluzorzOP•15mo ago

Large batch inference jobs with Bart

Unknown User•15mo ago

Message Not Public

sluzorzOP•15mo ago

Yeah, we’ve let runpod know.

Flynn•15mo ago

@sluzorz will you be finished about now?

sluzorzOP•15mo ago

Some of our batches are finishing now But we still have about 30 remaining since we couldn't spin up 100 A40s

Unknown User•15mo ago

Message Not Public

Flynn•15mo ago

okay please let me know when you've finished

sluzorzOP•15mo ago

We're mostly done But we will probably consume more in a few hours for embedding The A40s in runpod as just too good of an offering

Gaming

Programming

Maximum number of A40s that can run at one time

Did you find this page helpful?