Network drive issues

So here's the problem I've been having with RunPod (and that may lead to me giving up on it): I have a network drive and use it to install pods. I use ComfyUI, so I usually use a pre-install template to get started, then add nodesets and models according to the workflow. So far so good. The problem is that when I stop the pod to preserve the downloaded models, LoRAs, nodes, etc. I've installed, the data is preserved on the network drive. But when I next come to start up the pod again, the drive no longer has access to a GPU on that server and no RTX 4090s (for example) are available. Sometimes no GPUs are available at all on the server hosting my network drive. So I have to deploy a pod from a different server, and although the template gets installed, all of the models I'd previously downloaded don't get ported over to the new server, so for any workflow beyond just the ComfyUI install itself, I have to start all over again. This is time-consuming but I've had to do it over and over again. As a result, I've more or less given up on using complicated workflows that require many models and nodes to be installed. I would say that at least 80% of my time on RunPod is spent simply reinstalling models from previous workflows. Often this is so time-consuming that I don't even manage to generate anything at all. It's tedious and frutrating. What am I doing wrong?
12 Replies
Jason
Jason4w ago
when you say "stop the pod: meaning terminate right? since you cannot stop pod with network storage i don think you're doing it right, when you attach the correct network drive, it should be mounted in /workspace, and you can put your files that you want to save/persist there
Jason
Jason4w ago
This is how
No description
Jason
Jason4w ago
are you using a comfyui template? which one if yes
dokoissho
dokoisshoOP4w ago
Right yes, with network storage the pod is terminated, not stopped. I know about /workspace and files being saved there, but it hasn't been mounting when I'm connected to a different server - my network drive is on CA-2 - when I (usually) have to start a pod on a different server. For example, right now I have three different stopped pods on two different servers (EUR-IS-1, EUR-NO-1). I just launched another pod on my network drive (CA-2), since an RTX 4090 happened to be available. But I don't think any of the workspaces on the three other pods will transfer over - we'll see. The primary issue seems to be the network drive failing to mount on other servers - could that be because of using different templates? I've been trying quite a few different ones on different pods.
Jason
Jason4w ago
i dont quite get what you mean here:
But I don't think any of the workspaces on the three other pods will transfer over - we'll see.
transfer over to ...? but yes templates can override where do they mount on, to check use edit pod on a pod And you can only create pods with network storage in the same region / dc,
dokoissho
dokoisshoOP4w ago
I mean transfer over to the workspace on the network storage drive. The new pod I was just able to deploy on the network storage drive does still have all the data that was installed on there, so that's fine. The issue I've been having is that models and outputs created on other pods on other servers were in the workspaces on those pods, so once the pod was terminated everything was lost. From now on I'll only deploy pods on my network drive. The issue then becomes the frequent unavailability of the GPU I for ComfyUI video gens (RTX 4090), occasionally any GPUs on CA-2.
Jason
Jason4w ago
maybe the mount path of network storage isnt right? this is a rare case, do you want staff to check it for you? in a ticket so once the pod was terminated everything was lost. (you're using ns in the pod, and then the /workspace were lost after you terminate the pod?) does this happens for some pods only?
dokoissho
dokoisshoOP4w ago
See I am having the same problem now - the pod fully installed, I launch ComfyUI on the 8188 port and load a workflow (in this case the default one). I run the workflow and get an error that the checkpoint is not installed. But when I check the filesystem in my workspace using the notebook interface from port 8048, I see the model is in the checkpoints folder. But ComfyUI on 8188 isn't seeing it. Then when I checked the same directory using the Web Terminal, it's NOT there. So the notebook is giving opposite information to the web terminal. This is exactly the issue that I'm talking about. So yes, please open a ticket. The network drive is named FataMorgana and the pod ID is evrcyxwtndukj2. Deleted my 3 pods on non-network storage servers and my network-storage pod was auto-terminated because my balance ran out. Hours of wasted time and lost outputs. So back to square one we go. If I lose anything else from this new pod I'm done. Extremely frustrated and disillusioned with this whole experience.
Jason
Jason4w ago
Maybe restart your comfyui after putting models or check again if the paths are correct You shouldn't be able to go to checkpoints folder in jupyter btw, I think there is a bug for it And you can refresh the file browser in jupyter to update it, incase it's outdated
riverfog7
riverfog74w ago
Any chance you were using a template meant for serverless? The volume mount path is different there
dokoissho
dokoisshoOP4w ago
Okay maybe it's that, I'll try refreshing Jupyter Not as far as I know - right now I'm just using the basic Pytorch 2.4 template
riverfog7
riverfog74w ago
hmm

Did you find this page helpful?