best architecture opinion

Hello,
I would like to build an app that out of 1 prompt specified by a user, create 10 prompts. Then call a model once for each of these 10 prompts, giving me 10 responses. Then, do a final call to aggregate the 10 responses into one final response that will be returned to the user.

My question is the following, do you have any advice on how to build this ?
option a) send the user prompt to the serverless endpoint, and within the endpoint, create the 10 prompts, and call the model sequentially, and then one last time to aggregate the result. All of that in 1 call from the user to the serverless endpoint

option b) create the 10 prompts on the client, send them to the serverless endpoint (could be done in parallel), wait for the 10 responses, and then send the final aggregation prompt, together with the 10 responses, to aggregate and get the final response.

In terms of speed, I think option B is faster, as we can make the 10 calls in parallel.
In terms of cost, I don't think there is much of a difference, as we have to call the model 11 times in any case, but please correct me if Im missing something
In terms of complexity, it makes the serverless endpoint very simple, simple model inference, no other logic.

I will go with option B, but Im not experienced with serverless architecture, so please let me know if Im missing anything, or maybe an option C ?

best architecture opinion

best architecture opinion

Continue the conversation

Runpod

Continue the conversation

Runpod

Similar Threads