Pods suddenly start with cuda 12.4 when 12.8 is requested

I have been starting pods via the python sdk and specified the cuda version that I need (12.8). This used to work well until a day or two ago. Today, when I start pods, I get this in the logs:
==========

== CUDA ==

==========

CUDA Version 12.4.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.

By pulling and using the container, you accept the terms and conditions of this license:

https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
==========

== CUDA ==

==========

CUDA Version 12.4.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.

By pulling and using the container, you accept the terms and conditions of this license:

https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
I also can't ssh into them and the entrypoint script does not execute healthily. Example pod id: fnb9ae7p2wvgwo
1 Reply
nielsrolf
nielsrolfOP3w ago
The problem seems solved now. Not 100% sure if it was using a wrong version of the docker image and that was the reason, or if it was an actual bug on runpod's side

Did you find this page helpful?