RunPod

R

RunPod

We're a community of enthusiasts, engineers, and enterprises, all sharing insights on AI, Machine Learning and GPUs!

Join

⚡|serverless

⛅|pods

🔧|api-opensource

📡|instant-clusters

🗂|hub

Jupiter notebook (In chrome tab) consistently crashing after 20 hours

My Jupiter lab notebook chrome tab has crashed in the middle of 22 hours of training a model, how do i know if it's still training it, if it has stopped, or if it is just running without doing anything? This has happened to me 3 times in a row and this time i would like to know what is happening. The GPU usage is going up and down with is suggesting it is training and simply not showing on the notebook, but i would like to make sure.
No description

Extremely slow sync speed

Syncing a pod to dropbox and the speed is extremely slow. Maxing out at 80kb/s and dropping as low as a few b/s at times.

How can I remove a network volume?

Hi, I'd like to know how I can remove a network volume I created? Tried looking through your docs but couldn't find info on it, could you please help?
Solution:
You can delete it under the network volume section in GPU cloud

Can I remove a GPU & resize my storage after I've created a pod?

I'd like to create a pod with two GPUs. However, I won't be needing 2 forever so I would like to if I can remove one after I'm done with it. I would also like to know if I can resize my pod's persistent storage after I've created it (either by shrinking or adding more).
Solution:
Im not sure u can resize but prob ur best bet just have a network storage to always store to then u can always terminate and spin back up as needed 🙂

Need to update Auto1111 to 1.7.0

I want to enable SDXL inpainting, and git pull doesn't seem to work. I've understood that there are some other files that need to be altered as well, and sometimes things don't work as expected on Runpod (like updating an extension). Could I have some help in getting this to work?
Solution:
My template is already updated to 1.7.0 😎

How can I clean up storage in my network volume?

Hello, I'm using stable diffusion template with a network volume. I noticed that even though I clean up files in Jupyter, space is not freed up in my volume. I suspect files go to trash but not removed completely. I searched a lot but could not find the trash folder. Does anybody know where I can find or any other way of cleaning up my storage space properly?
Solution:
Alright I found using ncdu that path is /workspace/.Trash-0 and then I removed it with rm -rf /workspace/.Trash-0 All good now. Storage space is freed up....

Is there a way to get the SSH Terminal address for a pod using GraphQL api?

After creating a pod using GraphQL, I want to access this value. Is it possible? I can get a portion of it, but there is a random value that I can't see in the response.
No description

Help deploying LLaVA Flask API

I'm trying to create a LLaVa endpoint I can use in my project so I can assess 5 million photos with a Node script, similar to how I'm doing right locally currently with Ollama. I'm looking to deploy the 7b model on an RTX 4000, GPU Cloud not serverless to keep costs down. My preference is speed as well as cost so I'd ideally like to process multiple images at once, any advice welcome. After speaking to the author of the LLaVA RunPod template, he's recommended I use the below Flask method, but I'm not sure how I'd go around getting this deployed as I'm new to backend. Anybody able to help with some initial steps? https://github.com/ashleykleynhans/LLaVA/tree/main?tab=readme-ov-file#flask-api-inference...

Does RunPod support H100 confidential computing?

Months ago, another user mentioned the H100 confidential computing in this discord: https://discord.com/channels/912829806415085598/1131816583065505853/1131816583065505853. Does RunPod support it now? More information about Nvidia confidential computing: https://www.nvidia.com/en-us/data-center/solutions/confidential-computing/

Restricting the kinds of pods dev accounts can launch

Hello, I'm an admin of a research team. I would like to give researchers the ability to launch a pod, but I would like to restrict the kinds of pods which they can launch (cost <= community server pods RTX4000). Is there a way to do this?...

ssh2 with node doesn't work correctly ?

Hello I am trying to connect to the gpu cloud using ssh2 via the [email protected] using a ssh key. It work using ssh.shell but not ssh.exec (it asks for PTY and when it is set, it doesn't no send any command). I don't know what to do because I faced this problem with runpod and I can yet connect using my linux terminal instead of going through my script) ...

Error starting the container

error starting container: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: requirement error: unsatisfied condition: cuda>=12.1, please update your driver to a newer version, or use an earlier cuda container: unknow...
No description

Are the EU-CZ-1 servers down?

can spin up the instances logs are fine but can't connect to any of the services (sdwebui, jupyter, ssh) thank you...

extremely slow network and hard to connect throuhg ssh or jupyter

pod ID: tdyvi60zj9zzow actually, any pods I launch have this issue currently. and when git clone, I always get ``` 2024-01-03T03:16:09.997295856Z Cloning into 'pawsome-ai-compute'......
Solution:
CZ region currently has some kind of connectivity issues, I suggest using a different region until the issues in CZ are resolved.

remote desktop with pods

I'm having an issue with remote desktop now with the pods. I opened 6901 but I'm still not able to get the remote desktop GUI going. And there are no messages in the log about it either....
Solution:
Yes, most RunPod templates don't have a desktop environment. You can use web terminal or SSH. Some templates like mine also have Jupyter, which TheBloke's does not.

My pods in the CZ network are down.

They have been unresponsive for well over an hour. Please fix
Solution:
They made an announcement today / they relisted pods for people to spin up and grab data from.

Can I use VsCode remote-ssh with a runpod instance with no public ip?

Trying to connect using remote-ssh gives: Error: Your SSH client doesn't support PTY (community cloud instance, no public IP)

How to install SillyTavern to an instance?

I followed the instructions as in https://blog.runpod.io/how-to-install-sillytavern-in-a-runpod-instance/ but things have apparently chaged. I made new text file inside SillyTavern base install folder called whitelist.txt with IP addresses mentioned in the blogpost as per https://docs.sillytavern.app/usage/remoteconnections/ In terminal it seems to be working, but in My Pods the HTTP service port 8000 is not ready and can't connect. I've exposed the port in Edit Pod screen.

refer to the current running pod's id from environment variable

is there any way to refer to the current running id of pods from env?
Solution:
Yes but remember to use a $ in front of the variable name.
echo $RUNPOD_POD_ID
echo $RUNPOD_POD_ID
...

Cannot connect to jupyterlab/web terminal

I cant access them through browsers