not enough GPUs free
Hi there,
wish you a good day today. I have a serverless endpoint running on runpod, it is created on top of the network storage belongs to US-OR-1 data center. it was running well for somedays, but 20 mins before, I have encountered the issue that no worker is able to be created because no GPU resource. the system throws a log like this repeatedly.
2024-07-13T06:32:22Z create container USERNAME/ENDPOINT
2024-07-13T06:32:22Z error creating container: not enough GPUs free
how can I make sure there are GPU resources whenever the request comes, should I change the endpoint and the network volume to other region which has more GPU resoures? how often this shortage will be happening. it post a risk on the stability and quality of service which is critical in most scenarios.
thank you.
20 Replies
Unknown User•17mo ago
Message Not Public
Sign In & Join Server To View
I did not get it. the image version is my customized version on dockerhub, it might be V1, V2, V3 any thing, how does it related to the GPU resource competing? and which env file should I edit, to add the dummy any? how. thank you so much.
Unknown User•17mo ago
Message Not Public
Sign In & Join Server To View
any thing?
like ENVDUMMY=anything
Unknown User•17mo ago
Message Not Public
Sign In & Join Server To View
ok, so you suggest it is not actually a resource lackage, it is a bug
that is why I should add the dummy env and or update image version
Unknown User•17mo ago
Message Not Public
Sign In & Join Server To View
ok, thx
Unknown User•17mo ago
Message Not Public
Sign In & Join Server To View
I have added a dummy env. it did work, but it does not mean the problem solved, since the error gone after 2 or 3 mins by itself, after couple of times trying new worker. so, any idea or suggestion how to make it not happening again? thx
Unknown User•17mo ago
Message Not Public
Sign In & Join Server To View
yes, I did report. thank you so much for the support.
Unknown User•17mo ago
Message Not Public
Sign In & Join Server To View
"they" what do you mean "they", I thought you are from runpod support team, isn't you
Unknown User•17mo ago
Message Not Public
Sign In & Join Server To View
so you are hired by them or you are the volunteer to give support based on your experience.
Unknown User•17mo ago
Message Not Public
Sign In & Join Server To View
I feel you are very confident and familar with all sort of issue, platforms, technologies. at least would be a senior member of their support team. so it supprised me that you do not have access to runpod internal
Unknown User•17mo ago
Message Not Public
Sign In & Join Server To View
yes, you will for sure. thx anyway.