How to scale pod GPU count properly?

Hello, we have some pods running with 2x4090, and what is the best way to increase this to e.g 4x4090 without making sure that our existing allocated gpus will not be taken by others even if we are running on-demand?
4 Replies
xPaghkman
xPaghkman3mo ago
@Papa Madiator Do you know anyone who can help with this matter or give us some suggestions?
Madiator2011
Madiator20113mo ago
nothing can be done
ashleyk
ashleyk3mo ago
Yeah you have to start off with 4xGPU, you can't add more GPU afterwards and will always have the risk that others will take your existing GPUs.
xPaghkman
xPaghkman3mo ago
I see, so best way is start 4x GPU and attach the same network volume, and shut down previoud pair of pods Hmm, looks like cannot create network volume for US
Want results from more Discord servers?
Add your server
More Posts
Is execution timeout per request or per worker execution?https://docs.runpod.io/serverless/endpoints/send-requests#--execution-policy "Execution Timeout: SS3 ENV does not work as described in the Runpod DocumentionHi all, I have a serverless function and also all env variable as its written in documention. But itGPU type prioritization seems to have stopped working on 13th of MarchI have an endpoint with 3 cheapest GPU types selected in the order of their price (i.e. 4090 is my 3distributed trainingIs it possible to set up a slurm cluster for distributed training on Runpod?How can i bulk download all my images generated in my Output FolderHow can i bulk download all my images generated in my Output Folder (Fooocus)? I'm in the Jupyter LaHow to run OLLAMA on Runpod Serverless?As the title suggests, I’m trying to find out a way to deploy the OLLAMA on Runpod as a Serverless AData loss on podi rend pod with gpu type A5000, but suddenly my gpu type changed to rtx 3090 and all my data(150 gb)Serverless: module 'gradio.deprecation' has no attribute 'GradioDeprecationWarningHello! I'm getting this error when i use RunPod Fast Stable Diffusion with serverless. Can you pleasImg2txt code works locally but not after deployingI am using a model for Image 2 text , i have made its handler file and tested it locally , for testiUpload files to Network volume? Two days spent on this and can't make it happenHOW do I get my local safetensor LLM files on my PC to the network volume? Is the CLI the only way?