It seems my pipelines that leverage RunPod Serverless Infinity Embeddings are unable to requisition an pod, likely due to volume? All of the workers are showing status of Initializing or Throttled permanently in the UI. However when I check the GPU configuration, I have selected 4 valid GPUs all of which are either "Medium Supply" or "High Supply"
I end up getting a 502 error in my pipeline in response to my request, which based on my research appears to be related to the RunPod load balancer failing to find a backend pod that it can send jobs to, which when combined with the Throttled / Initializing messages in the UI makes me think that the RunPod service is overwhelmed. Is there a plan to improve the reliability of the service? I could also easily be missing something regarding my configuration though it is generally the Infinity Embeddings 1.1.4 quick start offered through the RunPod UI running BAAI/BGE-m3 with minimal additional configurations.
Is there a recommendation for how I can use a fallback service, an additional endpoint, anything from RunPod so that it can have time to self heal?