Too big requests for serverless infinity vector embedding cause errors
I keep running into "400 Bad Request" server errors for this service, and finally discovered that it was because my requests were too large and running into this constraint: https://github.com/runpod-workers/worker-infinity-embedding/blob/acd1a2a81714a14d77eedfe177231e27b18a48bd/src/utils.py#L14
Is this a hard limit?
GitHub
worker-infinity-embedding/src/utils.py at acd1a2a81714a14d77eedfe17...
Contribute to runpod-workers/worker-infinity-embedding development by creating an account on GitHub.
4 Replies
Unknown User•10mo ago
Message Not Public
Sign In & Join Server To View
Yep, that's what I started doing, but it's hard to come close to the memory limits of the GPU with a cap of 8192 items
@zilli if you can open a pr for that or create issue, not sure what the intent was behind this limitation
I finally remembered to start building the docker image (the last one took 2.5 hours...). I'll try it out tomorrow, and if it works, put in that PR
...and the build failed because the hardcoded nightly version of pytorch (for the end of life CUDA 12.1.0) is unavailable 😅
I'm rebuilding after updating dependencies, but the PR won't be just the string length change.