Multi GPU problem
Hi, how can I evenly distribute workers across multiple GPU? I am trying to get the Stable Diffusion model up, however I am getting an out of memory error as gunicorn is trying to run them on one GPU. How can I solve this problem, given that I need to run all the workers on the same port. Either how can I configure proxying requests inside the pod.

Unable to create pod with GraphQL
Hi I tried to use following command to create a pod to test.
```bash
curl --request POST \
  --url https://api.runpod.io/graphql \
  --header "Authorization: Bearer YOUR_API_KEY" ...
Creating a Pod with dockerArgs and a docker image from a registry that requires auth
I'm trying to create a pod from a template or from a docker image from a docker registry with authentication.
I'm using the method podFindAndDeployOnDemand.
If I specify a templateId, the pod starts but it seems that the dockerArgs I specify in the API call is ignored and the CMD in the Dockerfile is run instead.
...
Indicate region in deployment console menu
Hi can you please add an option to see what region the availability is for in the deployment console?

runpodctl create pod --communityCloud --gpuType 'A4500' --cost 0.19 is not working
I am trying to  deploy a pod RTX A4500 on community cloud, on web page I can see available machines with that price (0.19 USD/h) but comand returns: " runpodctl create pod --imageName 'pod-1' --communityCloud --gpuType 'NVIDIA GeForce RTX A4500' --templateId '...' --cost 0.19
Error: The current minimum price for this type of instance is 0.5. " Why?...
50/50 success with running a standart vllm template
So when I start vllm/vllm-openai:latest on 2xA100 or 4xA40 I only able to do it 1/2 or 1/3 times. I haven't noticed any logic befind it it just fails sometimes. Here are parameters I use for 2xA100 for instanse: --host 0.0.0.0 --port 8000 --model meta-llama/Llama-3.3-70B-Instruct --dtype bfloat16 --enforce-eager --gpu-memory-utilization 0.95 --api-key key  --max-model-len 16256 --tensor-parallel-size 2
I also need have some logs....
Container keeps restarting
Hey, I have a container that keeps restarting, not quite sure why - there's nothing in the logs (or they get deleted way too quickly when it restarts?). I'm using a custom template. The issue still remains with a long-running run command (e.g. 
/bin/bash -c "sleep infinity"). Any ideas what might be wrong?Extremely slow upload to HuggingFace
For the past 10 hours or so I've had issues uploading to HuggingFace on my pod (jhw1d9hmjb8d3v).
speedtest-cli shows acceptable speeds, but specifically uploading to HuggingFace goes below 1MB/s often....Enable performance counter on runpod
Hi, I'm trying to profile some CUDA kernels on a pod with A100 in order to improve its performance. Is there a way to enable the performance counters as per https://developer.nvidia.com/nvidia-development-tools-solutions-err_nvgpuctrperm-permission-issue-performance-counters on pods? I've tried to enable it by creating necessary config files on 
/etc/modprobe.d but no avail
It seems that the permission needs to be enabled on the host
```
When profiling within a container, access must be enabled on the host, or the container must be started with the appropriate permissions by passing --cap-add=SYS_ADMIN as an admin user....The "Fine tune an LLM with Axolotl on RunPod" tutorial should mention uploading public key first
The tutorial is very useful, but would be even more so if it mentioned that to "connect to it over secure SSH", you have to provide your SSH public key beforehand so the created image will have it for you to make the first connection.  That would help it further to be a self-contained article for this use case
Error response from daemon: container
After uploading my ED25519 SSH, creating a pod (using the "winglian/axolotl-cloud:main-latest image"), and trying to SSH into it, I immediately get a :
error after successfully authenticating to ssh.runpod.io with the public key.  Looking at the pod's system logs, it says the image was created, but it doesn't get beyond starting it and the  container log shows a "curl: no URL specified" error:
```...
Error response from daemon: container [..id..] is not running
Error response from daemon: container [..id..] is not running
Solution:
Quick update, this is due to the template in the tutorial no longer being maintained as the repo for git moved under axolotl.
The working RunPod Template for this is axolotali/axolotl-cloud:main-latest
And currently the axolotlai template is working. I have raised this issue internally with our team and we will get the template in the tutorial updated to point to this as well....
Templates view is broken
"Templates" on the website never stops loading: https://www.runpod.io/console/user/templates
Suspicious space consumption or volume disk not mounted
I provisioned a pod with 240GB volume disk, but that 240GB space is nowhere to be found.

runpodctl get pod -a does not return the pods IP
Per the title.  There is no way to get the IP address of a created Pod via command line
Solution:
I eventually found out that the GraphQL interface is more robust and you need to requisition a pod that can have a public IP, but not all can.
maintainance time
Start: 12/13/2024 14:31 Local Time
End: 12/13/2024 18:30 Local Time
what is the local time , where are you folks based ?
which country ?...
How often do pods get network speed tested?
I am setting up a pod and the reported upload speed is significantly lower than manually run speedtests. It also hasn't changed in hours which leads one to wonder how often this gets tested. Can a new test be triggered on demand?
Error: Unauthorized
Unable to create an available Pod via
...
$ runpodctl create pod --gpuType "1x NVIDIA A40" --imageName MedicineMan
Error: Unauthorized
$ runpodctl create pod --gpuType "1x NVIDIA A40" --imageName MedicineMan
Error: Unauthorized
Solution:
