C
C#3mo ago
LazyGuard

How to handle long running requests ?

Consider the following architecture: there is three different endpoints (see picture): 1. /generate-transcript (Resource in picture): This endpoint initiates the transcript generation process for a specific id (given in body). It handles the initial request from the client to start the transcription task. The app then returns a 202 Accepted and a Location header that contains a pointer to the resource status endpoint. 2. /transcript-status (Status Resource in picture): This endpoint is responsible for checking the status of the transcription process initiated by /generate-transcript. It helps the client monitor the progress and readiness of the transcript. The server responds with an empty 200 OK (or 404 it depends) if the status is unavailable, indicating that the transcript hasn't been generated yet. The client keeps pooling, when the transcript is available the response will be 302 with a Location header that contains a pointer to the transcript resource. 3. /transcripts (New Resource in picture): This endpoint serves the completed transcript upon successful generation. Question: What are the drawbacks of merging the status resource endpoint with the transcripts resource endpoint
No description
No description
20 Replies
not guilty
not guilty3mo ago
i've seen some services that have the same api for status and resource, but i would say it's usually the smaller ones if you expose some some data about the running process (to show if everything is working) to me it would be wise to have two distinct endpoints, if not then ok one thing i would make clear is how to obtain again the resource after having run the generation so that one doesn't need to re-run the process, because usually it's stuff that you pay
LazyGuard
LazyGuard3mo ago
what do you mean by "one thing i would make clear is how to obtain again the resource after having run the generation" ?
not guilty
not guilty3mo ago
how can i obtain the transcript for an already generated input without having the "job id" which i guess would be kept for an amount of time or at least if the "job id" is valid for requesting the transcript again
LazyGuard
LazyGuard3mo ago
you mean you've alread got a transcript the first time but let's say tomrrow you woul like to get it again right ?
not guilty
not guilty3mo ago
yes
LazyGuard
LazyGuard3mo ago
okay let me think about it.... One trivial way is to go through the whole chain again .... 1. You ask the /generate-trasncript with the id of the video, it will return the url for status 2. You ask the status endpoint that will immediately retun the id of the transcript 3. now you have the id, you can go ask /transcript what do you think ?
not guilty
not guilty3mo ago
i would say that a user could want to generate a new trascription for the same content usually re-downloading a transcript happens for two reasons: user lost it because he was not careful, or some data loss occurred
LazyGuard
LazyGuard3mo ago
hum, so my process should re-reun the transcript generation from scratch I see, let me think again
not guilty
not guilty3mo ago
it depends on what service are you offering, if it's an internal thing, if third parties buy this and there is a contract, and so on
LazyGuard
LazyGuard3mo ago
I am just prep for an interview so your questions help me already what about having the /transcript?id=..... When you got the transcript the first time you hsould have an id and you just keep it
not guilty
not guilty3mo ago
you can look for example at how google or azure do it they give you an "operation name" (or rather a code) and they tell you "we will hold this job's data for x months, after that you're screwed"
LazyGuard
LazyGuard3mo ago
do you have any links please ?
not guilty
not guilty3mo ago
eh i have the links in the internal portal, but i should be able to find the public thingy
not guilty
not guilty3mo ago
this could be a start, although maybe azure's one is not the simplest to understand https://learn.microsoft.com/en-us/azure/ai-services/speech-service/batch-transcription-create?pivots=rest-api
Create a batch transcription - Speech service - Azure AI services
Learn how to use Azure AI Speech for batch transcriptions, where you submit audio and then retrieve transcription results asynchronously.
not guilty
not guilty3mo ago
and around here should be google's one https://cloud.google.com/translate/docs
Google Cloud
Cloud Translation documentation  |  Google Cloud
Allows programatic integration with Google Translate.
not guilty
not guilty3mo ago
and then there's amazon, and other minor services (like say amberscript, or flyscribe)
LazyGuard
LazyGuard3mo ago
thanks a lot
Lex Li
Lex Li3mo ago
If a web service call is going to take very long or variable time, that usually indicates a risky design. Try to switch to a state machine or bi-directional channel (either pull or push) so that the client side can get timely status update without hitting server side timeout.
LazyGuard
LazyGuard3mo ago
@Lex Li I am thinking about this :
No description
Lex Li
Lex Li3mo ago
In a truly distributed system, you should have a worker node farmer behind the endpoints, managed by a few manager nodes, so that by scaling out the system can handle extensive work load. In such a design, each node knows clearly what the responsibilities are, not mixed like in your current design. But now you know what to look for and can get started.