Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!
I've been utilizing RunPod endpoints for the past month or so, no issues everythings been working wonderfully. Past week, a handful of my jobs have been failing. I'm not entirely sure why. I have not made any code changes and not changed the docker image that is from my template. I do notice that it seems to be waiting for GPU's to be available. but not sure why, when it finds them this error throws.
Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select
44 Replies
This is a code issue, what is the endpoint you are using?
Like my actual endpoint?
i've been calling
/run to run my jobs and get back a job ID and then webhook back to my server once done @ashleykYes the actual endpoint.
This makes no difference, there is something wrong with the code that your rundpoint is running.
No good, I need to know the code you are running.
Oh, sorry I'm not sure what you're looking for. the docker image tag?
I think when I had set up my endpoint and got everything working it was around this tag version of yours ashleykza/runpod-worker-a1111:2.2.1
Yeah, did you build your own image or use someone else's image?
Then all i've done is add loras and models to my network volume. eveyrthing was working great
I used yours but built mine on digital ocean, tagged it, and have just been using that one
I suggest updating to
ashleykza/runpod-worker-a1111:2.3.4.ok
so just change my template to that?
When did you build your own image?
Nov 21, 2023 at 10:32 pm
Sorry I am not reading the messages fast enough, I thought you were on
ashleykza/runpod-worker-a1111:2.2.1, so if you built it around that time, probably a good idea to pull the changes and build it again to get the updates.No need to appologize, you've been very helpful for me and I appreciate your guidance.
Do i just use your tag in my template, or do i go back through the process of deploying to digital ocean, tag to my docker hub and use that?
There isn't really a need to build your own image unless you customized something such as adding an additional extension or something like that, otherwise you can just use mine.
ok, no i didnt make any changes. Only changes I've made have been to my network volume
I will try the 2.3.4 tag of yours
Seems to be working much better! Do your tagged version ever get removed? Also, do you have a release log or somewhere that you update to keep up with changes?
Only inactive versions that haven't been used in a month or more, not active ones that are being pulled by workers. The release log is here:
https://github.com/ashleykleynhans/runpod-worker-a1111/releases
GitHub
Releases · ashleykleynhans/runpod-worker-a1111
RunPod Serverless Worker for the Automatic1111 Stable Diffusion API - ashleykleynhans/runpod-worker-a1111
Hello!
I'm making a Serverless NLLB Endpoint on RunPod.
I have built and pulled a docker image that works perfectly well locally, but when I deploy it on RunPod, it doesn't work with the same error (Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
Does anybody have some ideas about what's going wrong?
usually I see this error when checkpoint file is being corupted
Sorry, but I don't have any checkpoint file in my project... or I don't know about it 🙂 I'm a newbie in this field..
What is NLLB? Doesn't it use some kind of model?
Yes, I use facebook/nllb-200-distilled-600M from huggingface
This is what @Papa Madiator means by "checkpoint"
Ok, then what's the difference between local and serverless(in RunPod) model running? It works locally..
up cause on RunPod you use RunPod gpu's instead of local one
I understand this, of cause 🙂 But what does this change? I have device choice in code so gpu is used, when it possible.
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
I mean, what's the difference if I run cpu or gpu on totally the same code? Why is it OK with cpu, but it isn't OK with gpu?
Sorry, if I'm writing something strange 🙂
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
It's working in other environment (Beam, if you ever heart of it), so the problem that it doesn't work in RunPod...This conversation takes me some interesting debugging ideas, so I'll try and will return :))
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
I'm simply running facebook/nllb-200-distilled-600M model from Huggingface. That's all)
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Running it how? oobabooga or something else?
It's in the name of the thread: "Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!"
Serverless running
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
I'm making docker image by myself and I check the device (DEVICE:str = 'cuda' if torch.cuda.is_available() else 'cpu')
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
What makes the error is my question 🙂 I have just found out that the problem in function, that directly uses the library transformers. I can write its code, if it's ok here)
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
here this function
def translate(self, collection, tgt_lang):
texts = [t.strip() for t in collection]
batch_count = len(texts) / BATCH_SIZE
if batch_count < 1:
batch_count = 1
texts_split = np.split(np.array(texts), batch_count)
result = []
with torch.inference_mode():
for batch in texts_split:
encoded = self.tokenizer.batch_encode_plus(list(batch), max_length=self.max_length, truncation=True, return_tensors="pt", padding=True)
for key in encoded.keys():
encoded[key] = encoded[key].to(self.device)
translated = self.model.generate(**encoded, max_length=self.max_length, forced_bos_token_id=self.tokenizer.lang_code_to_id[tgt_lang])
decoded = self.tokenizer.batch_decode(translated, skip_special_tokens=True)
result.extend(decoded)
return [tr if len(tx)>0 else tx for tr, tx in zip(result, texts)]
This function is a method of my class model
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
https://pytorch.org/cppdocs/notes/inference_mode.html about inference mode you can read here 🙂 it's complicated to say in two words)
I'll try! Thanks!
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
I have changed the string
...
translated = self.model.to(self.device).generate( ...
...
added 'to(self.device)', as you advised me. All other strings are without .to(self.device)
And it's working now!!! ☺️ Great thanks!!
and if I write
encoded[key] = encoded[key]
without to(self.device), it's working also! (that is logical)
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View