问题描述:NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver
当我们重启ubuntu系统之后,使用nvidia-smi命令查看GPU使用情况时,有时候会出现“NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver”错误,这很可能是内核版本更新的问题,导致新版本内核和原来显卡驱动不匹配!
Linux Kernel
Kernel 是与计算机硬件接口的易替换软件的最低级别。它负责将所有以“用户模式”运行的应用程序连接到物理硬件,并允许称为服务器的进程使用进程间通信(IPC)彼此获取信息。
解决方案:切换到原来的内核版本
1. 查看内核列表
sudo dpkg --get-selections |grep linux-image
2. 查看当前使用的内核
->uname
uname -r
或者:
->/proc/version
/proc
目录包含虚拟文件,其中包含有关系统内存,CPU内核,已安装文件系统等的信息。有关正在运行的内核的信息存储在/proc/version
虚拟文件中。
cat /proc/version
3. 删除内核
tips:删除当前版本重启会使用低一级的已安装内核, 如果是最后一个内核版本删除之后重启会进入BIOS界面
sudo apt-get remove linux-image-5.15.0-52-generic
使用以下命令进行自动清理:
sudo apt autoremove
这个时候再去查看内核列表,就会发现 linux-image-5.15.0-52-generic变成deinstall的状态了:
注意(这是一个补充的情况):
有时候你需要同时删除包含unsigned的版本才行,不然它会相互替换使用而无法切换到你想要的旧版本:
sudo apt remove linux-image-5.15.0-53-generic linux-image-unsigned-5.15.0-53-generic
看下面的处理过程就能明白(我想用的版本是linux-image-5.15.0-52-generic,与上面的例子不同情况):
mulan@mulan-PowerEdge-R7525:~$ sudo dpkg --get-selections |grep linux-image
linux-image-5.15.0-52-generic install
linux-image-5.15.0-53-generic install
linux-image-5.8.0-43-generic deinstall
linux-image-unsigned-5.15.0-53-generic deinstall
mulan@mulan-PowerEdge-R7525:~$ sudo apt remove linux-image-5.15.0-53-generic
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
linux-image-unsigned-5.15.0-53-generic
Suggested packages:
fdutils linux-doc | linux-hwe-5.15-source-5.15.0 linux-hwe-5.15-tools linux-modules-extra-5.15.0-53-generic
The following packages will be REMOVED:
linux-image-5.15.0-53-generic
The following NEW packages will be installed:
linux-image-unsigned-5.15.0-53-generic
0 upgraded, 1 newly installed, 1 to remove and 185 not upgraded.
Need to get 0 B/11.6 MB of archives.
After this operation, 447 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
dpkg: linux-image-5.15.0-53-generic: dependency problems, but removing anyway as you requested:
linux-modules-5.15.0-53-generic depends on linux-image-5.15.0-53-generic | linux-image-unsigned-5.15.0-53-generic; however:
Package linux-image-5.15.0-53-generic is to be removed.
Package linux-image-unsigned-5.15.0-53-generic is not installed.
(Reading database ... 165303 files and directories currently installed.)
Removing linux-image-5.15.0-53-generic (5.15.0-53.59~20.04.1) ...
W: Removing the running kernel
I: /boot/vmlinuz is now a symlink to vmlinuz-5.15.0-52-generic
I: /boot/initrd.img is now a symlink to initrd.img-5.15.0-52-generic
/etc/kernel/postrm.d/initramfs-tools:
update-initramfs: Deleting /boot/initrd.img-5.15.0-53-generic
/etc/kernel/postrm.d/zz-update-grub:
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/init-select.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.15.0-52-generic
Found initrd image: /boot/initrd.img-5.15.0-52-generic
done
Selecting previously unselected package linux-image-unsigned-5.15.0-53-generic.
(Reading database ... 165299 files and directories currently installed.)
Preparing to unpack .../linux-image-unsigned-5.15.0-53-generic_5.15.0-53.59~20.04.1_amd64.deb ...
Unpacking linux-image-unsigned-5.15.0-53-generic (5.15.0-53.59~20.04.1) ...
Setting up linux-image-unsigned-5.15.0-53-generic (5.15.0-53.59~20.04.1) ...
I: /boot/vmlinuz is now a symlink to vmlinuz-5.15.0-53-generic
I: /boot/initrd.img is now a symlink to initrd.img-5.15.0-53-generic
Processing triggers for linux-image-unsigned-5.15.0-53-generic (5.15.0-53.59~20.04.1) ...
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-5.15.0-53-generic
/etc/kernel/postinst.d/zz-update-grub:
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/init-select.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.15.0-53-generic
Found initrd image: /boot/initrd.img-5.15.0-53-generic
Found linux image: /boot/vmlinuz-5.15.0-52-generic
Found initrd image: /boot/initrd.img-5.15.0-52-generic
done
mulan@mulan-PowerEdge-R7525:~$ sudo dpkg --get-selections |grep linux-image
linux-image-5.15.0-52-generic install
linux-image-5.15.0-53-generic deinstall
linux-image-5.8.0-43-generic deinstall
linux-image-unsigned-5.15.0-53-generic install
mulan@mulan-PowerEdge-R7525:~$ vim /etc/default/grub.d/init-select.cfg
mulan@mulan-PowerEdge-R7525:~$ sudo apt remove linux-image-5.15.0-53-generic linux-image-unsigned-5.15.0-53-generic
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package 'linux-image-5.15.0-53-generic' is not installed, so not removed
The following packages will be REMOVED:
linux-image-unsigned-5.15.0-53-generic linux-modules-5.15.0-53-generic
0 upgraded, 0 newly installed, 2 to remove and 185 not upgraded.
After this operation, 130 MB disk space will be freed.
Do you want to continue? [Y/n] y
(Reading database ... 165303 files and directories currently installed.)
Removing linux-image-unsigned-5.15.0-53-generic (5.15.0-53.59~20.04.1) ...
W: Removing the running kernel
I: /boot/vmlinuz is now a symlink to vmlinuz-5.15.0-52-generic
I: /boot/initrd.img is now a symlink to initrd.img-5.15.0-52-generic
/etc/kernel/postrm.d/initramfs-tools:
update-initramfs: Deleting /boot/initrd.img-5.15.0-53-generic
/etc/kernel/postrm.d/zz-update-grub:
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/init-select.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.15.0-52-generic
Found initrd image: /boot/initrd.img-5.15.0-52-generic
done
Removing linux-modules-5.15.0-53-generic (5.15.0-53.59~20.04.1) ...
mulan@mulan-PowerEdge-R7525:~$ sudo dpkg --get-selections |grep linux-image
linux-image-5.15.0-52-generic install
linux-image-5.15.0-53-generic deinstall
linux-image-5.8.0-43-generic deinstall
linux-image-unsigned-5.15.0-53-generic deinstall
4.查看内核的启动顺序
grep menuentry /boot/grub/grub.cfg
备注:假如不小心配置修改错误,可以在重启电脑后进入Minimal BASH-like
line editing界面,可以输入下面指令显示出启动的图形界面:
grub>normal
5.Ubuntu设置开机默认内核
6. Ubuntu关闭自动更新
1)命令行关闭系统自动更新,使用命令打开文件并编辑(将双引号中的“1”全部置“0”即可,修改后保存):
mulan@mulan-PowerEdge-R7525:~$ sudo vim /etc/apt/apt.conf.d/10periodic
APT::Periodic::Update-Package-Lists "0";
APT::Periodic::Download-Upgradeable-Packages "0";
APT::Periodic::AutocleanInterval "0";
APT::Periodic::Unattended-Upgrade "0";
2)图形界面来关闭自动更新,找到软件更新(Software & Updates)
3)ubuntu默认启动了自动更新内核,为了避免出现重启系统后遇到错误进入不到系统中去,我们可以进一步关闭内核更新,使用当前内核。
sudo apt-mark hold linux-image-5.15.0-48-generic
如果要重启启动内核更新,对应执行unhold就可以了。
禁止系统更新,一般用的第二种方法发现有时候不起作用,为了保险起见,建议以上三种都进行设置!
参考:解决NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver
朱颜辞镜花辞树,敏捷开发靠得住!