Hi team, I'm trying to get a serverless endpoint for a model like qwen3-80b.
My understanding is that with cached models, the cold-start should be in seconds. My question is: once I have a cached model, will the cold start always be in ~seconds? What's the tradeoff for having cached models?
I guess I'm confused on the sentence: "If no cached host machines are available ..." When would this be the case?