Running 30 a100 workers

Can I run 30 a100 workers for an endpoint. We have a business process which needs low processing time requirements. I want to test how much would it cost to hanle 30 request in this platform daily basis to see if it is going to be feasable for us. How can I increase worker number it is allowing me to increase to 5 right now.
4 Replies
cubbo
cubbo5w ago
Currently a minimum balance is required to increase the number of workers. It should be $200 for 20 workers and $300 for 30 workers. Once you have a higher minimum balance you should see a button in the console under the "Serverless" section where you make new endpoints (like in this image)
No description
Batyray
BatyrayOP5w ago
Not planning to spend that much if the performance and price balance of the platform is not a fit for us is there any special case that someone can increase my quota temporarly ? @cubbo @cubbo I have another question no needed for incrasing quota I use a100 gpu setting in colab and same a100 gpu setting in here In google colab inference takes 3 minutes to finish for same data but here it is almost 6 min I am not talking about cold start + processing time this is the time that it takes to process data for the model What may cause such difference. I am not sure this question is too out of context but this is one of the first AI workload we are going to deploy in gpu so I do not have much experience with GPU virtualization etc . And I calculated the price according to compute time of colab + - variations doubling the time in same hardware seems a little bit too much By Hardware I mean the gpu maybe ram and cpu is not matching. But model does not consume on other sources that much.
VeyDlin
VeyDlin4w ago
maybe the model takes forever to load from the network drive?
flash-singh
flash-singh4w ago
heres my advice - measure only the compute time, dont include cold start, model loading, or other times, those can vary if you use network volume - check if its a100 sxm vs pcie, pcie is slower

Did you find this page helpful?