R
RunPod2mo ago
LordOdin

500 When trying to spawn pods too fast, is there a way to spawn multiple?

Ive managed to start 100 nodes with no issues using synchronus requests, but when I use async it gives me 500s quite often. I usually try to start 20-100 nodes at once but even 2 can cause the 500.
Error starting node nkKkw: Server error '500 Internal Server Error' for url 'https://rest.runpod.io/v1/pods'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500
Response: {"error":"create pod: Something went wrong. Please try again later or contact support.","status":500}
Error starting node nkKkw: Server error '500 Internal Server Error' for url 'https://rest.runpod.io/v1/pods'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500
Response: {"error":"create pod: Something went wrong. Please try again later or contact support.","status":500}
Payload
GPU_TYPE_MAPPING = {
"3090": "NVIDIA GeForce RTX 3090",
"3090Ti": "NVIDIA GeForce RTX 3090 Ti",
"A5000": "NVIDIA RTX A5000",
"A6000": "NVIDIA RTX A6000",
"4000Ada": "NVIDIA RTX 4000 Ada Generation",
}
GPU_TYPES = list(GPU_TYPE_MAPPING.keys())

QUEUE_MANAGER_HOST = "http://127.0.0.1:7777"

DEFAULT_PAYLOAD = {
"allowedCudaVersions": [],
"cloudType": "SECURE",
"computeType": "GPU",
"containerDiskInGb": 50,
"containerRegistryAuthId": "",
"countryCodes": [""],
"cpuFlavorPriority": "availability",
"dataCenterPriority": "availability",
"dockerEntrypoint": [],
"dockerStartCmd": [],
"env": {
"QUEUE_MANAGER_HOST": QUEUE_MANAGER_HOST
},
"gpuCount": 1,
"gpuTypePriority": "availability",
"interruptible": False,
"locked": False,
"minDiskBandwidthMBps": 500,
"minDownloadMbps": 500,
"minRAMPerGPU": 32,
"minUploadMbps": 500,
"minVCPUPerGPU": 8,
"ports": [],
"supportPublicIp": False
}
GPU_TYPE_MAPPING = {
"3090": "NVIDIA GeForce RTX 3090",
"3090Ti": "NVIDIA GeForce RTX 3090 Ti",
"A5000": "NVIDIA RTX A5000",
"A6000": "NVIDIA RTX A6000",
"4000Ada": "NVIDIA RTX 4000 Ada Generation",
}
GPU_TYPES = list(GPU_TYPE_MAPPING.keys())

QUEUE_MANAGER_HOST = "http://127.0.0.1:7777"

DEFAULT_PAYLOAD = {
"allowedCudaVersions": [],
"cloudType": "SECURE",
"computeType": "GPU",
"containerDiskInGb": 50,
"containerRegistryAuthId": "",
"countryCodes": [""],
"cpuFlavorPriority": "availability",
"dataCenterPriority": "availability",
"dockerEntrypoint": [],
"dockerStartCmd": [],
"env": {
"QUEUE_MANAGER_HOST": QUEUE_MANAGER_HOST
},
"gpuCount": 1,
"gpuTypePriority": "availability",
"interruptible": False,
"locked": False,
"minDiskBandwidthMBps": 500,
"minDownloadMbps": 500,
"minRAMPerGPU": 32,
"minUploadMbps": 500,
"minVCPUPerGPU": 8,
"ports": [],
"supportPublicIp": False
}
some extras payload sprinkles
url = f"{Runpod.BASE_URL}/pods"
payload["gpuTypeIds"] = list(GPU_TYPE_MAPPING.values())
payload["imageName"] = DOCKER_IMAGE
payload["name"] = f"{user_id}-{random_id}"
url = f"{Runpod.BASE_URL}/pods"
payload["gpuTypeIds"] = list(GPU_TYPE_MAPPING.values())
payload["imageName"] = DOCKER_IMAGE
payload["name"] = f"{user_id}-{random_id}"
2 Replies
nathaniel
nathaniel2mo ago
this is pretty much the fastest way to create pods atm, so we're probably running into bottlenecks we haven't encountered before. the "Something went wrong" error message means the error originates from our graphql, where for some reason we decided to obscure the real error message from the user. I can try to find the real problem in the logs if you dm your email you use for the account
LordOdin
LordOdinOP2mo ago
DM sent As a side note. Shutting down many pods doesn’t seem to 500 Any progress on this? takes us over a minute to turn on bulk pods at the moment. Also noticing some 500s if we request them back to back synchronous. Its much more rare.

Did you find this page helpful?