RunpodR
Runpod6mo ago
12 replies
Campbell

Serverless Instance Queue Stalling

Our team has encountered a pretty consistent issue wherein requests will queue without action to our Serverless endpoints (with the following config) and not be actioned by available idling or throttled instances.

Find the original AI help thread here with more details: https://discord.com/channels/912829806415085598/1392974715567607862

Here's some TLDR notes:

Background

- We have a US-CA-1 network volume hosting a fairly large (~40GB) model file

- We deploy on any available 24GB, 40GB, or 80GB card (Pro/non-pro)
- There are at least 5 worker slots, and during these bugs, none of them are being utilized - nor are other serverless endpoints running actively

- Requests queue for minutes to hours (increased idle time to test), and at times, randomly begin to work - especially if settings are fiddled with (worker count moved up, settings saved, and count moved down, saved). Does not always work however

- Once the workers actually start running a request, there is no issue - container loads and request is handled without issue

- No issues with Pods, no billing issues.

- Cards are always shown as available for this datacenter+network-volume duo during these issues, expanding options to all available cards (120GB) does not fix this.

- Only way to avoid this is to use an active worker, which is entirely not ideal as these are intermittent tasks with unplanned schedules.


Looking for some assistance on the matter. The model would be difficult to package into the container image due to us regularly building with buildx, which emulates the build environment and thus lengthens the amount of time it takes to build images of that scale to 20+ hours. It is not impossible to build this together, however we would very much like to avoid it - and use the features as intended.

Thank you for the help in advance,
DC @ Synexis
Was this page helpful?