Hi @deanQ fortunately, for whatever reason, the long queue times without horizontal scaling issues have seemed to disappear.
However, I am experiencing a new issue. some of my requests are failing to return. Interestingly, from my logs, the inference completes successfully its just the result is never returned. The last status I get (from polling) is IN_PROGRESSIN_PROGRESS and my logs show the job completed successfully. What typically happens is in a subsequent poll I get a COMPLETEDCOMPLETED status with the output return. Instead, im seeing it hang on IN_PROGRESSIN_PROGRESS and then my endpoint.status requests start failing. This is happening maybe 5% of the time.
My result payload is ~300kb. is that too large? should I be saving it to storage and returning url? Thats the only thing I can think of. I'd appreciate some help here is its a big issue for my application.
Here are some requests that hit this issue:
Endpoint Id: mmumv0n4k99461
Id: ac74d68b-ec22-48b8-aaf1-9023d2600e97-u1
workerId: 4dxsfu0y6ylg9v
Id: 0234e98a-71a4-4ec8-a2a6-24ef9f5bc7a1-u1
workerId: gqqcsuxbczbnct
Id: 59ccf6c2-7981-4247-9691-b9de3fb3ff2a-u1
workerId: 1d6pswp366osik
Id: 80156eba-28fd-467e-9277-2e18a49a24b2-u1
workerId: o8nhl6j0fdcubz
Id: 150747b2-4271-4b5b-b806-76b8f007adb6-u1
workerId: 1d6pswp366osik