Sorry for sounding like a layman; The thing is, providers like OpenAI have something known as TPM (tokens per minute) limits. So, how do people build applications around it when multiple users are using it simultaneously?
Let's say I'm creating a feature like Deep Research, and due to the TPM limit, I can only run one research process at a time. If I exceed the limit, it will throw a TPM error again; hence making it not so possible for me to even test a pilot for my application.
Thanks!