R
Runpod•16mo ago
MokshMalik

Training jobs using script

Hey, Can anyone tell me if runpod gives the feature to create a training script that can be run from anywhere and I can use that to create a GPU instance, and load and save my data to external cloud storages just like in AWS Sagemaker training script mode? I need to train multiple models in such manner with different architectures to see which one performs the best.
19 Replies
Unknown User
Unknown User•16mo ago
Message Not Public
Sign In & Join Server To View
yhlong00000
yhlong00000•16mo ago
Overview | RunPod Documentation
Unlock serverless functionality with RunPod SDKs, enabling developers to create custom logic, simplify deployments, and programatically manage infrastructure, including Pods, Templates, and Endpoints.
Overview | RunPod Documentation
RunPod CLI (runpodctl) is a command-line interface tool designed to automate and manage GPU pods on RunPod.
Export data | RunPod Documentation
Export RunPod data to various cloud providers, including Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage, Backblaze B2 Cloud Storage, and Dropbox, with secure key and access token management.
MokshMalik
MokshMalikOP•16mo ago
I'm fairly new to RunPod. Can you please point me to a tutorial where a remote training job is run on a pod, the model weights are stored on S3, and the pod automatically kills itself once the training is complete?
yhlong00000
yhlong00000•16mo ago
You probably have to write some code to pull data from s3 and after training you can terminate the pod using our cli. Btw, ChatGPT is really good at writing code😀 https://docs.runpod.io/cli/overview
Overview | RunPod Documentation
RunPod CLI (runpodctl) is a command-line interface tool designed to automate and manage GPU pods on RunPod.
Unknown User
Unknown User•16mo ago
Message Not Public
Sign In & Join Server To View
MokshMalik
MokshMalikOP•16mo ago
Sorry, it is still unclear. Does runpod has a tutorial on training a custom model on a GPU instance? I have tried searching for it, but I have not found any.
Unknown User
Unknown User•16mo ago
Message Not Public
Sign In & Join Server To View
Marcus
Marcus•16mo ago
Probably not working anymore since the Dreambooth endpoint used TheLastBen's code I recommend using Kohya_ss, EveryDream2Trainer or OneTrainer
Marcus
Marcus•16mo ago
This guy has some videos for training image models: https://www.youtube.com/@SECourses/videos
YouTube
SECourses
Welcome to Software Engineering Courses (SECourses) – the ultimate destination for skillfully curated insights into state-of-the-art technologies and programming paradigms. We demystify the realms of Artificial Intelligence, Stable Diffusion, DreamBooth, LoRA, ControlNet, Textual Inversion, Software Engineering, Programming, C#, .NET, ASP .NET, ...
Marcus
Marcus•16mo ago
What kind of model are you training?
MokshMalik
MokshMalikOP•16mo ago
Well, I'm training different kinds of segmentation models for my tasks, varying from simple U-Net to Attention U-Net, and might also go for transformer-based segmentation models. I'd like to run an instance for each model, so I can compare their performance in as little time as possible.
Unknown User
Unknown User•16mo ago
Message Not Public
Sign In & Join Server To View
MokshMalik
MokshMalikOP•16mo ago
A big problem is to auto-kill the pod once the training is complete and saving the model weights before that.
Unknown User
Unknown User•16mo ago
Message Not Public
Sign In & Join Server To View
MokshMalik
MokshMalikOP•16mo ago
Can you please shed some light on how to auto-kill the instance?
Marcus
Marcus•16mo ago
runpodctl remove pod $RUNPOD_POD_ID
runpodctl remove pod $RUNPOD_POD_ID
Unknown User
Unknown User•16mo ago
Message Not Public
Sign In & Join Server To View
MokshMalik
MokshMalikOP•16mo ago
Okay, thanks! If I just stop my pod and do not remove it, will I still be billed? And once I'll be inside the pod, can I stop it from there? Will the command runpodctl remove pod $RUNPOD_POD_ID work from inside the pod?
Unknown User
Unknown User•16mo ago
Message Not Public
Sign In & Join Server To View

Did you find this page helpful?