RunPod•16mo ago

Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!

I've been utilizing RunPod endpoints for the past month or so, no issues everythings been working wonderfully. Past week, a handful of my jobs have been failing. I'm not entirely sure why. I have not made any code changes and not changed the docker image that is from my template. I do notice that it seems to be waiting for GPU's to be available. but not sure why, when it finds them this error throws.

Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select

44 Replies

ashleyk•16mo ago

This is a code issue, what is the endpoint you are using?

ricopellaOP•16mo ago

Like my actual endpoint? i've been calling /run to run my jobs and get back a job ID and then webhook back to my server once done @ashleyk

ashleyk•16mo ago

Yes the actual endpoint. This makes no difference, there is something wrong with the code that your rundpoint is running. No good, I need to know the code you are running.

ricopellaOP•16mo ago

Oh, sorry I'm not sure what you're looking for. the docker image tag? I think when I had set up my endpoint and got everything working it was around this tag version of yours ashleykza/runpod-worker-a1111:2.2.1

ashleyk•16mo ago

Yeah, did you build your own image or use someone else's image?

ricopellaOP•16mo ago

Then all i've done is add loras and models to my network volume. eveyrthing was working great I used yours but built mine on digital ocean, tagged it, and have just been using that one

ashleyk•16mo ago

I suggest updating to ashleykza/runpod-worker-a1111:2.3.4.

ricopellaOP•16mo ago

ok so just change my template to that?

ashleyk•16mo ago

When did you build your own image?

ricopellaOP•16mo ago

Nov 21, 2023 at 10:32 pm

ashleyk•16mo ago

Sorry I am not reading the messages fast enough, I thought you were on ashleykza/runpod-worker-a1111:2.2.1, so if you built it around that time, probably a good idea to pull the changes and build it again to get the updates.

ricopellaOP•16mo ago

No need to appologize, you've been very helpful for me and I appreciate your guidance. Do i just use your tag in my template, or do i go back through the process of deploying to digital ocean, tag to my docker hub and use that?

ashleyk•16mo ago

There isn't really a need to build your own image unless you customized something such as adding an additional extension or something like that, otherwise you can just use mine.

ricopellaOP•16mo ago

ok, no i didnt make any changes. Only changes I've made have been to my network volume I will try the 2.3.4 tag of yours Seems to be working much better! Do your tagged version ever get removed? Also, do you have a release log or somewhere that you update to keep up with changes?

ashleyk•16mo ago

Only inactive versions that haven't been used in a month or more, not active ones that are being pulled by workers. The release log is here: https://github.com/ashleykleynhans/runpod-worker-a1111/releases

GitHub

Releases · ashleykleynhans/runpod-worker-a1111

RunPod Serverless Worker for the Automatic1111 Stable Diffusion API - ashleykleynhans/runpod-worker-a1111

Irina•12mo ago

Hello! I'm making a Serverless NLLB Endpoint on RunPod. I have built and pulled a docker image that works perfectly well locally, but when I deploy it on RunPod, it doesn't work with the same error (Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select) Does anybody have some ideas about what's going wrong?

Madiator2011•12mo ago

usually I see this error when checkpoint file is being corupted

Irina•12mo ago

Sorry, but I don't have any checkpoint file in my project... or I don't know about it 🙂 I'm a newbie in this field..

digigoblin•12mo ago

What is NLLB? Doesn't it use some kind of model?

Irina•12mo ago

Yes, I use facebook/nllb-200-distilled-600M from huggingface

digigoblin•12mo ago

This is what @Papa Madiator means by "checkpoint"

Irina•12mo ago

Ok, then what's the difference between local and serverless(in RunPod) model running? It works locally..

Madiator2011•12mo ago

up cause on RunPod you use RunPod gpu's instead of local one

Irina•12mo ago

I understand this, of cause 🙂 But what does this change? I have device choice in code so gpu is used, when it possible.

Jason•12mo ago

What do you mean "by what does this change"?

Irina•12mo ago

I mean, what's the difference if I run cpu or gpu on totally the same code? Why is it OK with cpu, but it isn't OK with gpu? Sorry, if I'm writing something strange 🙂

Jason•12mo ago

It's fine just needed to clear things out so maybe I can give you a better answer Hmm What's not working with gpu ? Some code are specifically designed to be run on cpu, or gpu, or both. But they need to get access to the hardware's program Maybe if it works with cpu and not gpu it doesn't have support yet

Irina•12mo ago

It's working in other environment (Beam, if you ever heart of it), so the problem that it doesn't work in RunPod...This conversation takes me some interesting debugging ideas, so I'll try and will return :))

Jason•12mo ago

Whats working in beam srry? i dont know what type of software, model are you running

Irina•12mo ago

I'm simply running facebook/nllb-200-distilled-600M model from Huggingface. That's all)

Jason•12mo ago

Hmm okay then whats the error on runpod?

digigoblin•12mo ago

Running it how? oobabooga or something else?

Irina•12mo ago

It's in the name of the thread: "Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!" Serverless running

Jason•12mo ago

using hf transformers? which library did you use to run it? might be a cuda driver version or template problem. which docker image are you using? This error is caused by the model and the inputs not being moved to the same device before calling the model. To resolve, move both the model and inputs to the same device before running the model:

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model.to(device)
inputs = inputs.to(device)

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model.to(device)
inputs = inputs.to(device)

found this on stackoverflow

Irina•12mo ago

I'm making docker image by myself and I check the device (DEVICE:str = 'cuda' if torch.cuda.is_available() else 'cpu')

Jason•12mo ago

whats your code to run it? ( like in the handler ) that makes that error

Irina•12mo ago

What makes the error is my question 🙂 I have just found out that the problem in function, that directly uses the library transformers. I can write its code, if it's ok here)

Jason•12mo ago

well maybe your code the library you use mostly yeah that what i asked sure

Irina•12mo ago

here this function def translate(self, collection, tgt_lang): texts = [t.strip() for t in collection] batch_count = len(texts) / BATCH_SIZE if batch_count < 1: batch_count = 1 texts_split = np.split(np.array(texts), batch_count) result = [] with torch.inference_mode(): for batch in texts_split: encoded = self.tokenizer.batch_encode_plus(list(batch), max_length=self.max_length, truncation=True, return_tensors="pt", padding=True) for key in encoded.keys(): encoded[key] = encoded[key].to(self.device) translated = self.model.generate(**encoded, max_length=self.max_length, forced_bos_token_id=self.tokenizer.lang_code_to_id[tgt_lang]) decoded = self.tokenizer.batch_decode(translated, skip_special_tokens=True) result.extend(decoded) return [tr if len(tx)>0 else tx for tr, tx in zip(result, texts)] This function is a method of my class model

Jason•12mo ago

What is torch.inference_mode(): for btw try using .to(self.device) on your self.model or the self.tokenizer

Irina•12mo ago

https://pytorch.org/cppdocs/notes/inference_mode.html about inference mode you can read here 🙂 it's complicated to say in two words) I'll try! Thanks!

Jason•12mo ago

right thanks

Irina•12mo ago

I have changed the string ... translated = self.model.to(self.device).generate( ... ... added 'to(self.device)', as you advised me. All other strings are without .to(self.device) And it's working now!!! ☺️ Great thanks!! and if I write encoded[key] = encoded[key] without to(self.device), it's working also! (that is logical)

Jason•12mo ago

Uyep Hope this device points to the gpu If not then it will use cpu 😂

Gaming

Programming

Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!

Did you find this page helpful?