Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!

I've been utilizing RunPod endpoints for the past month or so, no issues everythings been working wonderfully. Past week, a handful of my jobs have been failing. I'm not entirely sure why. I have not made any code changes and not changed the docker image that is from my template. I do notice that it seems to be waiting for GPU's to be available. but not sure why, when it finds them this error throws. Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select
No description
44 Replies
ashleyk
ashleyk5mo ago
This is a code issue, what is the endpoint you are using?
ricopella
ricopella5mo ago
Like my actual endpoint? i've been calling /run to run my jobs and get back a job ID and then webhook back to my server once done @ashleyk
ashleyk
ashleyk5mo ago
Yes the actual endpoint. This makes no difference, there is something wrong with the code that your rundpoint is running. No good, I need to know the code you are running.
ricopella
ricopella5mo ago
Oh, sorry I'm not sure what you're looking for. the docker image tag? I think when I had set up my endpoint and got everything working it was around this tag version of yours ashleykza/runpod-worker-a1111:2.2.1
ashleyk
ashleyk5mo ago
Yeah, did you build your own image or use someone else's image?
ricopella
ricopella5mo ago
Then all i've done is add loras and models to my network volume. eveyrthing was working great I used yours but built mine on digital ocean, tagged it, and have just been using that one
ashleyk
ashleyk5mo ago
I suggest updating to ashleykza/runpod-worker-a1111:2.3.4.
ricopella
ricopella5mo ago
ok so just change my template to that?
ashleyk
ashleyk5mo ago
When did you build your own image?
ricopella
ricopella5mo ago
Nov 21, 2023 at 10:32 pm
ashleyk
ashleyk5mo ago
Sorry I am not reading the messages fast enough, I thought you were on ashleykza/runpod-worker-a1111:2.2.1, so if you built it around that time, probably a good idea to pull the changes and build it again to get the updates.
ricopella
ricopella5mo ago
No need to appologize, you've been very helpful for me and I appreciate your guidance. Do i just use your tag in my template, or do i go back through the process of deploying to digital ocean, tag to my docker hub and use that?
ashleyk
ashleyk5mo ago
There isn't really a need to build your own image unless you customized something such as adding an additional extension or something like that, otherwise you can just use mine.
ricopella
ricopella5mo ago
ok, no i didnt make any changes. Only changes I've made have been to my network volume I will try the 2.3.4 tag of yours Seems to be working much better! Do your tagged version ever get removed? Also, do you have a release log or somewhere that you update to keep up with changes?
ashleyk
ashleyk5mo ago
Only inactive versions that haven't been used in a month or more, not active ones that are being pulled by workers. The release log is here: https://github.com/ashleykleynhans/runpod-worker-a1111/releases
GitHub
Releases · ashleykleynhans/runpod-worker-a1111
RunPod Serverless Worker for the Automatic1111 Stable Diffusion API - ashleykleynhans/runpod-worker-a1111
Irina
Irina4w ago
Hello! I'm making a Serverless NLLB Endpoint on RunPod. I have built and pulled a docker image that works perfectly well locally, but when I deploy it on RunPod, it doesn't work with the same error (Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select) Does anybody have some ideas about what's going wrong?
Madiator2011
Madiator20114w ago
usually I see this error when checkpoint file is being corupted
Irina
Irina4w ago
Sorry, but I don't have any checkpoint file in my project... or I don't know about it 🙂 I'm a newbie in this field..
digigoblin
digigoblin4w ago
What is NLLB? Doesn't it use some kind of model?
Irina
Irina4w ago
Yes, I use facebook/nllb-200-distilled-600M from huggingface
digigoblin
digigoblin4w ago
This is what @Papa Madiator means by "checkpoint"
Irina
Irina4w ago
Ok, then what's the difference between local and serverless(in RunPod) model running? It works locally..
Madiator2011
Madiator20114w ago
up cause on RunPod you use RunPod gpu's instead of local one
Irina
Irina4w ago
I understand this, of cause 🙂 But what does this change? I have device choice in code so gpu is used, when it possible.
nerdylive
nerdylive4w ago
What do you mean "by what does this change"?
Irina
Irina4w ago
​I mean, what's the difference if I run cpu or gpu on totally the same code? Why is it OK with cpu, but it isn't OK with gpu? Sorry, if I'm writing something strange 🙂
nerdylive
nerdylive4w ago
It's fine just needed to clear things out so maybe I can give you a better answer Hmm What's not working with gpu ? Some code are specifically designed to be run on cpu, or gpu, or both. But they need to get access to the hardware's program Maybe if it works with cpu and not gpu it doesn't have support yet
Irina
Irina4w ago
It's working in other environment (Beam, if you ever heart of it), so the problem that it doesn't work in RunPod...This conversation takes me some interesting debugging ideas, so I'll try and will return :))
nerdylive
nerdylive4w ago
Whats working in beam srry? i dont know what type of software, model are you running
Irina
Irina4w ago
I'm simply running facebook/nllb-200-distilled-600M model from Huggingface. That's all)
nerdylive
nerdylive4w ago
Hmm okay then whats the error on runpod?
digigoblin
digigoblin4w ago
Running it how? oobabooga or something else?
Irina
Irina4w ago
It's in the name of the thread: "Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!" Serverless running
nerdylive
nerdylive4w ago
using hf transformers? which library did you use to run it? might be a cuda driver version or template problem. which docker image are you using? This error is caused by the model and the inputs not being moved to the same device before calling the model. To resolve, move both the model and inputs to the same device before running the model:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model.to(device)
inputs = inputs.to(device)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model.to(device)
inputs = inputs.to(device)
found this on stackoverflow
Irina
Irina4w ago
I'm making docker image by myself and I check the device (DEVICE:str = 'cuda' if torch.cuda.is_available() else 'cpu')
nerdylive
nerdylive4w ago
whats your code to run it? ( like in the handler ) that makes that error
Irina
Irina4w ago
What makes the error is my question 🙂 I have just found out that the problem in function, that directly uses the library transformers. I can write its code, if it's ok here)
nerdylive
nerdylive4w ago
well maybe your code the library you use mostly yeah that what i asked sure
Irina
Irina4w ago
here this function def translate(self, collection, tgt_lang): texts = [t.strip() for t in collection] batch_count = len(texts) / BATCH_SIZE if batch_count < 1: batch_count = 1 texts_split = np.split(np.array(texts), batch_count) result = [] with torch.inference_mode(): for batch in texts_split: encoded = self.tokenizer.batch_encode_plus(list(batch), max_length=self.max_length, truncation=True, return_tensors="pt", padding=True) for key in encoded.keys(): encoded[key] = encoded[key].to(self.device) translated = self.model.generate(**encoded, max_length=self.max_length, forced_bos_token_id=self.tokenizer.lang_code_to_id[tgt_lang]) decoded = self.tokenizer.batch_decode(translated, skip_special_tokens=True) result.extend(decoded) return [tr if len(tx)>0 else tx for tr, tx in zip(result, texts)] This function is a method of my class model
nerdylive
nerdylive4w ago
What is torch.inference_mode(): for btw try using .to(self.device) on your self.model or the self.tokenizer
Irina
Irina4w ago
https://pytorch.org/cppdocs/notes/inference_mode.html about inference mode you can read here 🙂 it's complicated to say in two words) I'll try! Thanks!
nerdylive
nerdylive4w ago
right thanks
Irina
Irina4w ago
I have changed the string ... translated = self.model.to(self.device).generate( ... ... added 'to(self.device)', as you advised me. All other strings are without .to(self.device)  And it's working now!!! ☺️ Great thanks!! and if I write encoded[key] = encoded[key] without to(self.device), it's working also! (that is logical)
nerdylive
nerdylive4w ago
Uyep Hope this device points to the gpu If not then it will use cpu 😂