Hi admins or anyone who can help, I'm running into some intermittent issues while deploying a large language model using vLLM on Runpod. Here's a quick summary of what's happening:
Occasionally, when sending requests to the pod, I get a 404 error response (happens maybe once every 50 requests or so). However, these requests don't show up at all in the vLLM logs, which makes me suspect that Runpod might be intercepting or dropping them before they reach the container. I reached out to Runpod support, and they suggested switching to using a TCP port instead. After making that change, the 404 errors seem resolved, but now I'm sometimes hitting EOF errors during requests.
Has anyone else encountered this? Any ideas on what might be causing the EOF errors with TCP ports, or suggestions for troubleshooting/fixing this? I'm happy to provide more details like pod specs, vLLM version, or logs if needed. Thanks in advance for any assistance!