error creating container

error creating container: nvidia-smi: parsing output of line 0: failed to parse (pcie.link.gen.max) into int: strconv.Atoi: parsing "": invalid syntax
No description
60 Replies
riverfog7
riverfog76mo ago
hi what gpu r u using
Anmol Sharma
Anmol SharmaOP6mo ago
3090
riverfog7
riverfog76mo ago
what image..?
Anmol Sharma
Anmol SharmaOP6mo ago
No description
Anmol Sharma
Anmol SharmaOP6mo ago
i also got same error for 2.4.0
riverfog7
riverfog76mo ago
community cloud or secure cloud?
Anmol Sharma
Anmol SharmaOP6mo ago
ON-demand
riverfog7
riverfog76mo ago
mine works fine
Anmol Sharma
Anmol SharmaOP6mo ago
i changed Container Disk volume to 200
riverfog7
riverfog76mo ago
any changes in CMD?
Anmol Sharma
Anmol SharmaOP6mo ago
so it chnages from spot to on-demand no
riverfog7
riverfog76mo ago
can you try recreating the thing
Anmol Sharma
Anmol SharmaOP6mo ago
can you come on vc and help me out
riverfog7
riverfog76mo ago
mine works with a 3090 and that template i cant speak here
Anmol Sharma
Anmol SharmaOP6mo ago
on lounge vc?
riverfog7
riverfog76mo ago
im in a library soo
Anmol Sharma
Anmol SharmaOP6mo ago
u just be mute and see and type for anything wrong please
riverfog7
riverfog76mo ago
okay why tho mine starts fine
Anmol Sharma
Anmol SharmaOP6mo ago
i tried diffrent template also i want to train a cv model so thats why i am using it
riverfog7
riverfog76mo ago
other gpus too? try A40 maybe that specific 3090 has a problem
Anmol Sharma
Anmol SharmaOP6mo ago
yeah so which is the best gpu
riverfog7
riverfog76mo ago
depends on the model
Anmol Sharma
Anmol SharmaOP6mo ago
i can use for training my cv model
riverfog7
riverfog76mo ago
what's your model like big or small if its small and you wanna be cost effective A40 / 3090 / 4090 sth like this if its large and doesnt fit in those gpus or u wanna do large batch sizes then go with A100 / H100 / H200 / B100(this is overkill tho) @Anmol Sharma did u solve it @Dj sorry for the ping but I think that nvidia-smi: failed to parse pcie.link.gen.max is a HW error. can you check that?
Anmol Sharma
Anmol SharmaOP6mo ago
ya sorry something came up , had to go out of my room ya i will use a40 then
Poddy
Poddy6mo ago
@Anmol Sharma
Escalated To Zendesk
The thread has been escalated to Zendesk!
Ticket ID: #16220
Unknown User
Unknown User6mo ago
Message Not Public
Sign In & Join Server To View
Anmol Sharma
Anmol SharmaOP6mo ago
okay i have one more problem so when i upload my dataset it is crashing
riverfog7
riverfog76mo ago
lets just talk here
Anmol Sharma
Anmol SharmaOP6mo ago
ok
riverfog7
riverfog76mo ago
how large is the dataset? in gigabytes?
Anmol Sharma
Anmol SharmaOP6mo ago
yes i want to use 3 datasets
riverfog7
riverfog76mo ago
and what are you uploading with??
Anmol Sharma
Anmol SharmaOP6mo ago
one is 32gb , 14gb and 5gb
riverfog7
riverfog76mo ago
hmm.. did you provision enough storage? where are you uploading to? /workspace?
Anmol Sharma
Anmol SharmaOP6mo ago
No description
Anmol Sharma
Anmol SharmaOP6mo ago
have a look to this
riverfog7
riverfog76mo ago
how much images are in your dataset?
Dj
Dj6mo ago
@Anmol Sharma Can you share the pod id of the pod that gave you the error at the top of this thread?
Anmol Sharma
Anmol SharmaOP6mo ago
umm i dont have a note of that
riverfog7
riverfog76mo ago
you can check at audit logs
Anmol Sharma
Anmol SharmaOP6mo ago
give me a sec sorry for typo
Dj
Dj6mo ago
I can grab it too, it's just easier if you already know the id :p If you can't track it down let me know I'll go find it in your account history.
Anmol Sharma
Anmol SharmaOP6mo ago
No description
Anmol Sharma
Anmol SharmaOP6mo ago
yeah please help me out
riverfog7
riverfog76mo ago
where r u uploading?
Anmol Sharma
Anmol SharmaOP6mo ago
in a folder i am creating
riverfog7
riverfog76mo ago
and maybe use a network volume as it is persistent and not subject to dataloss it needs to be in workspace
Anmol Sharma
Anmol SharmaOP6mo ago
ya ya
riverfog7
riverfog76mo ago
and make volume disk 200gigs and container disk 20gigs
Anmol Sharma
Anmol SharmaOP6mo ago
in that i am creating a a sub folder
riverfog7
riverfog76mo ago
volume disk -> /workspace container disk - >everything else
Anmol Sharma
Anmol SharmaOP6mo ago
let me try and i will get back to you i am getting s much less speed that turtle is faster, i have a running pod for last 1 hour and still the dataset is being uploaded which is 5 gb, only 517.00MB have been uploaded so far
Anmol Sharma
Anmol SharmaOP6mo ago
No description
Anmol Sharma
Anmol SharmaOP6mo ago
and it exited
riverfog7
riverfog76mo ago
its a spot pod use on demand if you want it to never exit
Anmol Sharma
Anmol SharmaOP6mo ago
let me try it
KaSuTeRaMAX
KaSuTeRaMAX6mo ago
I am also suffering from this phenomenon. The image is a self-made image based on nvidia/cuda:12.6.2-cudnn-runtime-ubuntu24.04. The information I have is - It occurs in serverless - RTX3090 - With the same image, this problem sometimes occurs and sometimes does not (if it does occur, it goes into an infinite loop, so I end it with "Terminate") - It seems to have started occurring recently in EU-CZ-1 - I think it has been occurring frequently in the US for some time now
yhlong00000
yhlong000006mo ago
I have a feeling this might relate to cuda version, you want to filter machine has 12.6+
KaSuTeRaMAX
KaSuTeRaMAX6mo ago
Thank you. Since all versions are specified in the worker settings, I will narrow it down to 12.6 and above. I will contact you again if the issue persists.

Did you find this page helpful?