All workers idle despite many jobs in queue
I have 5 workers sitting idle and 100s of jobs stuck "in queue" without any processing
Solution:Jump to solution
OK, looks like Microsoft pulled their model off Hugging Face very unexpectedly: https://github.com/microsoft/TRELLIS/issues/264
GitHub
Model Files deleted from Official HuggingFace of TRELLIS · Issue #...
Hi there, amazing work! I am trying to run the training code, so I can finetune the model on custom dataset, however, I am unable to run the command python dataset_toolkits/encode_ss_latent.py --ou...
4 Replies
Anybody at RunPod who can help with this?
@Dj can you help?
@NERDDISCO
Endpoint ID: nmrvk0ftnftb6h
Good morning, I can check.
It looks like you're good by now, but I have these workers as crashing at the time you reported it here. Can you just let me know if everythings okay?
It's weird that they're crashing all of a sudden since nothing was changed for weeks.
Killed all the workers, they go idle, and then unhealthy. The idle ones never start working
Solution
OK, looks like Microsoft pulled their model off Hugging Face very unexpectedly: https://github.com/microsoft/TRELLIS/issues/264
GitHub
Model Files deleted from Official HuggingFace of TRELLIS · Issue #...
Hi there, amazing work! I am trying to run the training code, so I can finetune the model on custom dataset, however, I am unable to run the command python dataset_toolkits/encode_ss_latent.py --ou...