Services don't start

This morning I tried starting a pod using the https://github.com/ashleykleynhans/stable-diffusion-docker template, and no matter how long I leave it after getting the "Container is READY!" confirmation, the services never start. The RunPod Application Manager (port 8000) and Jupyter Labs (port 8888) connect without issue, but a1111 (port 3000), kohya_ss (port 3010), comfyUI (port 3020) and tensorboard (port 6006) all show the "connection is not up yet" page. I exposed the external port 44082, which made no change, and stopping/starting the apps in the runpod application manager also makes no change. I usually run this from a network volume with everything already preconfigured, and I'm currently running it in the CZ1 region. Neither running it from the network volume or running it from scratch make a difference, services can't connect either way. Current ID: rupmemk46gtepk
GitHub
GitHub - ashleykleynhans/stable-diffusion-docker: Docker image for ...
Docker image for Stable Diffusion WebUI with ControlNet, After Detailer, Dreambooth, Deforum and roop extensions, as well as Kohya_ss and ComfyUI - GitHub - ashleykleynhans/stable-diffusion-docker:...
8 Replies
Justin Merrell
Justin Merrell•6mo ago
What are you seeing for the container logs?
cauldron_cake💚
I can restart the pod to grab the exact logs, but it looked like everything was standard in the logs all the way through "Container is READY!", system and container both started up without issue in 1-2 minutes Hmm. I just started it on another GPU config and it's going through a longer cycle building the container, so maybe didn't build properly on the other config? I was using one of the RTX 4090 configs which I had used several times previously, and used last yesterday
Justin Merrell
Justin Merrell•6mo ago
Let me know if the second one works and then feel free to post the pod id here for the one that wasn't working
cauldron_cake💚
same issue on the new pod, ID: yzbsztfwcaf1ga Here are the container logs: (as a file since I don't have Nitro) I did notice this template didn't populate in a search or by default like it usually does, I had to launch it using the link from the GitHub page, not sure if that might be relevant 2023-12-29T16:04:35.791740132Z Starting Nginx service... 2023-12-29T16:04:35.811540626Z * Starting nginx nginx 2023-12-29T16:04:35.824502734Z ...done. 2023-12-29T16:04:35.824513804Z Running pre-start script... 2023-12-29T16:04:35.824517553Z Container is running 2023-12-29T16:04:35.824521583Z Syncing venv to workspace, please wait... 2023-12-29T16:05:17.502348603Z Syncing Stable Diffusion Web UI to workspace, please wait... 2023-12-29T16:05:19.924204304Z Syncing Kohya_ss to workspace, please wait... 2023-12-29T16:05:53.743971926Z Syncing ComfyUI to workspace, please wait... 2023-12-29T16:05:58.184933424Z Syncing Application Manager to workspace, please wait... 2023-12-29T16:05:58.654091435Z Fixing Stable Diffusion Web UI venv... 2023-12-29T16:05:58.657608222Z Fixing Kohya_ss venv... 2023-12-29T16:05:58.661171919Z Fixing ComfyUI venv... 2023-12-29T16:05:58.665633451Z Configuring accelerate... 2023-12-29T16:05:58.688464516Z Starting Stable Diffusion Web UI 2023-12-29T16:05:58.689192512Z Stable Diffusion Web UI started 2023-12-29T16:05:58.689210491Z Log file: /workspace/logs/webui.log 2023-12-29T16:05:58.691251259Z Starting Kohya_ss Web UI 2023-12-29T16:05:58.692034254Z Kohya_ss started 2023-12-29T16:05:58.692047304Z Log file: /workspace/logs/kohya_ss.log 2023-12-29T16:05:58.696341637Z Starting ComfyUI 2023-12-29T16:05:58.706286843Z ComfyUI started 2023-12-29T16:05:58.706305543Z Log file: /workspace/logs/comfyui.log 2023-12-29T16:05:58.709811431Z Starting Tensorboard 2023-12-29T16:05:58.717965869Z ln: failed to create symbolic link '/workspace/logs/dreambooth/dreambooth': File exists 2023-12-29T16:05:58.720013886Z ln: failed to create symbolic link '/workspace/logs/ti/textual_inversion': File exists 2023-12-29T16:05:58.722665349Z Tensorboard Started 2023-12-29T16:05:58.723045417Z All services have been started 2023-12-29T16:05:58.723575894Z Pod Started 2023-12-29T16:05:58.723615143Z Starting Jupyter Lab... 2023-12-29T16:05:58.723913951Z Jupyter Lab started 2023-12-29T16:05:58.723957801Z Exporting environment variables... 2023-12-29T16:05:58.731740011Z Container is READY! I thnk everything looks normal except maybe the symlinks for dreambooth and text inversion Update: Assuming it's something with the template. The Kohya_ss standalone template worked fine
ashleyk
ashleyk•6mo ago
Not an issue with the template, many people are using this template, must be some other issue.
cauldron_cake💚
Hey Ashley! Thanks for the quick reply on github. So far I've tried running it on an RTX 6000 Ada and an RTX 4090, and the server has been set to any, but I can try a specific server to isolate. I did notice that runpod is no longer giving me the choice between secure cloud (which I normally use) and gpu cloud. Currently my only option is GPU cloud. Wasn't sure if something changed on my end or if it was just a change in runpod's UI, but seemed worth mentioning.
ashleyk
ashleyk•5mo ago
You can select Community Cloud or Secure Cloud in the Cloud Type filter at the top of the page. It defaults to Secure Cloud but you can change it to Community Cloud. Everything is working for me in Secure Cloud as well as Community Cloud. I used an A5000 in SK region in Community Cloud as well as an A4000. I used a 4090 in IS region in Secure Cloud and its working perfectly every time.
cauldron_cake💚
So weird. Thanks for checking! I'll keep testing