首先查询服务器的gpu型号

[kfk@bigdata-pro01 ~]$ lshw -C display
WARNING: you should run this program as super-user.
*-display
description: VGA compatible controller
product: SVGA II Adapter
vendor: VMware
physical id: f
bus info: pci@0000:00:0f.0
version: 00
width: 32 bits
clock: 33MHz
capabilities: vga_controller bus_master cap_list rom
configuration: driver=vmwgfx latency=64
resources: irq:16 ioport:1070(size=16) memory:e8000000-efffffff memory:fe000000-fe7fffff memory:c0400000-c0407fff
WARNING: output may be incomplete or inaccurate, you should run this program as super-user.
[kfk@bigdata-pro01 ~]$

根据GPU型号 在nvidia官方网站上下载对应型号的驱动

根据不同的型号下载相应的驱动

如何在centos上安装nvidia驱动_nvidia

卸载nouveau

编辑dist-blacklist.conf

vim /usr/lib/modprobe.d/dist-blacklist.conf
在文件末尾添加
blacklist nouveau
options nouveau modeset=0

mode tools can also control driver binding.

#Syntax: see modprobe.conf(5).
#watchdog drivers
blacklist i8xx_tco#framebuffer drivers
blacklist aty128fb
blacklist atyfb
blacklist radeonfb
blacklist i810fb
blacklist cirrusfb
blacklist intelfb
blacklist kyrofb
blacklist i2c-matroxfb
blacklist hgafb
#blacklist nvidiafb
blacklist rivafb
blacklist savagefb
blacklist sstfb
blacklist neofb
blacklist tridentfb
blacklist tdfxfb
blacklist virgefb
blacklist vga16fb
blacklist viafb#ISDN - see bugs 154799, 159068
blacklist hisax
blacklist hisax_fcpcipnp#sound drivers
blacklist snd-pcsp#I/O dynamic configuration support for s390x (bz #563228)
blacklist chsc_sch#crypto algorithms
blacklist sha1-mb#see bz #1562114
blacklist sha256-mb
blacklist sha512-mb
blacklist nouveau
options nouveau modeset=0注释掉blacklist nvidiafb
#blacklist nvidiafb
编辑blacklist.conf
添加blacklist nouveau
mkdir -p /etc/modprobe.d

如何在centos上安装nvidia驱动_centos_02

重建 initramfs-3.10.0-957.el7.x86_64.img

其中3.10.0-957.el7是内核编号。不同的内核这个名字会略有差异。
mv /boot/initramfs-如何在centos上安装nvidia驱动_运维_03(uname -r)-nouveau.img
dracut /boot/initramfs-$(uname -r).img $(uname -r)

安装 kernel-devel

这一步也很关键,如果没有安装kernel-devel,那么需要安装它。不然在安装nvidia驱动时,它返回256的错误码。
yum install kernel-devel kernel-headers -y
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
Package kernel-headers-3.10.0-957.el7.x86_64 already installed and latest version
Resolving Dependencies
–> Running transaction check
—> Package kernel-devel.x86_64 0:3.10.0-957.el7 will be installed
–> Finished Dependency Resolution

Dependencies Resolved

========================================================================================================================================================================================================================================= Package Arch Version Repository Size

Installing:
kernel-devel x86_64 3.10.0-957.el7 base 17 M

Transaction Summary

Install 1 Package

Total download size: 17 M

reboot

nouveau的配置需要重启才能生效。重启后lsmod|grep nouveau确保nouveau驱动被禁止。

安装驱动

使用init 3

使用init 3进入字符界面

执行cuda的run文件

chmod +x cuda_10.2.89_440.33.01_linux.run
./cuda_10.2.89_440.33.01_linux.run
Driver: Installed
Toolkit: Installed in /usr/local/cuda-10.2/
Samples: Installed in /root/, but missing recommended libraries

Please make sure that

  • PATH includes /usr/local/cuda-10.2/bin
  • LD_LIBRARY_PATH includes /usr/local/cuda-10.2/lib64, or, add /usr/local/cuda-10.2/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.2/bin
To uninstall the NVIDIA Driver, run nvidia-uninstall

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.2/doc/pdf for detailed information on setting up CUDA.
Logfile is /var/log/cuda-installer.log

使用命令nvidia-smi确认驱动是否安装正确

[root@ASR1 asr]# nvidia-smi
Mon Dec 5 22:48:04 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|=++==============|
| 0 Tesla T4 Off | 00000000:31:00.0 Off | 0 |
| N/A 65C P0 32W / 70W | 0MiB / 15109MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla T4 Off | 00000000:B1:00.0 Off | 0 |
| N/A 63C P0 24W / 70W | 0MiB / 15109MiB | 0% Default |
±------------------------------±---------------------±---------------------+±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process

综上所述

通过确定GPU型号及操作系统,从nvidia网站下载驱动。然后屏蔽nouveau,安装kernel。当这些都完成后,使用init 3进入字符界面。再执行nvidia驱动的run,在提示输入的选项中选择输入accept,然后选择install就可以了。最后使用驱动自带的nvidia-smi可执行程序进行验证驱动是否完成了安装。