A “pull” based approach (the GPU calls to an endpoint when it’s ready) would make a lot more sense h

A “pull” based approach (the GPU calls to an endpoint when it’s ready) would make a lot more sense here. The GPU asks the queue for work when it’s idle/finished a job. Queues, SQS, Pub/Sub, Pulsar, etc - this “hang on for minutes or longer for a response” is not really reliable in any system.
Was this page helpful?