Streaming responses Serveless Endpoint

Currently using serveless endpoint for inference, and it seems the streaming response is not working the same as with a dedicated endpoint. I have same setup both when is dedicated and when is serveless. i can see that the reponses are not coming the same streaming way and speed as the dedicated endpoint.

Solution

Yes. The default value is 50. MIN_BATCH_SIZE should be default 1 already.