第一步 配置主机
虚拟化通过iommu 特性将物理设备透传到vm里面,iommu的driver是vfio提供。
OS环境: ubuntu20.04 LTS
GPU版本:NVIDIA Corporation TU104
bios需要开启vt-d host需要隔离该gpu 需要将这一组iommu同时bind到vfio-pci driver上
- 安装包
apt install qemu-kvm qemu-utils libvirt-clients bridge-utils ovmf -y
- 修改/etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
GRUB_CMDLINE_LINUX=""
GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on iommu=pt kvm.ignore_msrs=1 vfio-pci.ids=01:00.0,01:00.1,01:00.2,01:00.3"
vfio-pci.ids值来自如下命令:
lspci -nnv |grep -i nvidia
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104 [GeForce RTX 2070 SUPER] [10de:1e84] (rev a1) (prog-if 00 [VGA controller])
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
01:00.1 Audio device [0403]: NVIDIA Corporation TU104 HD Audio Controller [10de:10f8] (rev a1)
01:00.2 USB controller [0c03]: NVIDIA Corporation TU104 USB 3.1 Host Controller [10de:1ad8] (rev a1) (prog-if 30 [XHCI])
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU104 USB Type-C UCSI Controller [10de:1ad9] (rev a1)
Kernel modules: i2c_nvidia_gpu
确认下是不是同属于一组
bash iommu.sh
iommu.sh内容如下:
#!/bin/bash
# change the 17 if needed
shopt -s nullglob
for d in /sys/kernel/iommu_groups/{0..17}/devices/*; do
n=${d#*/iommu_groups/*}; n=${n%%/*}
printf 'IOMMU Group %s ' "$n"
lspci -nns "${d##*/}"
done;
如何确认iommu_groups的个数?
dmesg -T|grep -i iommu
(venv) root@openstack-ubuntu:~# dmesg -T|grep -i iommu
[二 11月 23 12:47:16 2021] Command line: BOOT_IMAGE=/boot/vmlinuz-5.11.0-40-generic root=/dev/mapper/vgubuntu-root ro intel_iommu=on iommu=pt kvm.ignore_msrs=1 vfio-pci.ids=01:00.0,01:00.1
[二 11月 23 12:47:16 2021] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.11.0-40-generic root=/dev/mapper/vgubuntu-root ro intel_iommu=on iommu=pt kvm.ignore_msrs=1 vfio-pci.ids=01:00.0,01:00.1
[二 11月 23 12:47:16 2021] DMAR: IOMMU enabled
[二 11月 23 12:47:16 2021] DMAR-IR: IOAPIC id 2 under DRHD base 0xfed91000 IOMMU 1
[二 11月 23 12:47:16 2021] iommu: Default domain type: Passthrough (set via kernel command line)
[二 11月 23 12:47:16 2021] pci 0000:00:00.0: Adding to iommu group 0
[二 11月 23 12:47:16 2021] pci 0000:00:01.0: Adding to iommu group 1
[二 11月 23 12:47:16 2021] pci 0000:00:02.0: Adding to iommu group 2
[二 11月 23 12:47:16 2021] pci 0000:00:14.0: Adding to iommu group 3
[二 11月 23 12:47:16 2021] pci 0000:00:14.2: Adding to iommu group 3
[二 11月 23 12:47:16 2021] pci 0000:00:15.0: Adding to iommu group 4
[二 11月 23 12:47:16 2021] pci 0000:00:15.1: Adding to iommu group 4
[二 11月 23 12:47:16 2021] pci 0000:00:16.0: Adding to iommu group 5
[二 11月 23 12:47:16 2021] pci 0000:00:17.0: Adding to iommu group 6
[二 11月 23 12:47:16 2021] pci 0000:00:1b.0: Adding to iommu group 7
[二 11月 23 12:47:16 2021] pci 0000:00:1c.0: Adding to iommu group 8
[二 11月 23 12:47:16 2021] pci 0000:00:1c.2: Adding to iommu group 9
[二 11月 23 12:47:16 2021] pci 0000:00:1c.3: Adding to iommu group 10
[二 11月 23 12:47:16 2021] pci 0000:00:1c.4: Adding to iommu group 11
[二 11月 23 12:47:16 2021] pci 0000:00:1d.0: Adding to iommu group 12
[二 11月 23 12:47:16 2021] pci 0000:00:1f.0: Adding to iommu group 13
[二 11月 23 12:47:16 2021] pci 0000:00:1f.3: Adding to iommu group 13
[二 11月 23 12:47:16 2021] pci 0000:00:1f.4: Adding to iommu group 13
[二 11月 23 12:47:16 2021] pci 0000:00:1f.5: Adding to iommu group 13
[二 11月 23 12:47:16 2021] pci 0000:01:00.0: Adding to iommu group 1
[二 11月 23 12:47:16 2021] pci 0000:01:00.1: Adding to iommu group 1
[二 11月 23 12:47:16 2021] pci 0000:01:00.2: Adding to iommu group 1
[二 11月 23 12:47:16 2021] pci 0000:01:00.3: Adding to iommu group 1
[二 11月 23 12:47:16 2021] pci 0000:02:00.0: Adding to iommu group 14
[二 11月 23 12:47:16 2021] pci 0000:04:00.0: Adding to iommu group 15
[二 11月 23 12:47:16 2021] pci 0000:05:00.0: Adding to iommu group 16
[二 11月 23 12:47:16 2021] pci 0000:06:00.0: Adding to iommu group 17
[二 11月 23 12:47:17 2021] intel_iommu=on
root@openstack-ubuntu:/opt/images/packer_tutorial/centos-vanilla# lspci -nnv -s 01:00.0
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104 [GeForce RTX 2070 SUPER] [10de:1e84] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Gigabyte Technology Co., Ltd TU104 [GeForce RTX 2070 SUPER] [1458:4001]
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at a4000000 (32-bit, non-prefetchable) [size=16M]
Memory at 90000000 (64-bit, prefetchable) [size=256M]
Memory at a0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 5000 [size=128]
Expansion ROM at a5000000 [virtual] [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Capabilities: [bb0] Resizable BAR <?>
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
- 更新grup
update-grub
- reboot
- 将vfio-pci driver通过pci bus id应用(vfio.sh需要创建,参考其它网站)
vim /etc/initramfs-tools/scripts/init-top/vfio.sh
#!/bin/sh
PREREQ=""
prereqs()
{
echo "$PREREQ"
}
case $1 in
prereqs)
prereqs
exit 0
;;
esac
for dev in 0000:01:00.0 0000:01:00.1 0000:01:00.2 0000:01:00.3
do
echo "vfio-pci" > /sys/bus/pci/devices/$dev/driver_override
echo "$dev" > /sys/bus/pci/drivers/vfio-pci/bind
done
exit 0
- 修改vfio.sh权限
chmod +x /etc/initramfs-tools/scripts/init-top/vfio.sh
- 在文件/etc/initramfs-tools/modules添加
options kvm ignore_msrs=1
- 在文件/etc/modprobe.d/blacklist.conf增加主机过滤
blacklist snd_hda_intel
blacklist vga16fb
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
- 修改文件/etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:1e84,10de:10f8,10de:1ad9,10de:1ad8
上面的地址来自对应的
lspci -nnv |grep -i nvidia
- 更新initramfs文件
update-initramfs -u -k all
- 重启操作系统
- 验证vfio-pci drvier是否被对应的设备使用
lspci -nnv
并且会在/sys/bus/pci/drivers/vfio-pci/下生成对应的设备如下图:
也会在/dev/vfio下生成两个设备如下图:
故障
问题1:
- gpu被占用
表现:nvida-smi
有显示运行的进程,并且解绑的时候一直卡住
echo "0000:01:00.0" >/sys/bus/pci/drivers/nvidia/unbind
解决方法kill掉对应的进程。
如果gpu被nvidia driver使用,通过命令查看gpu是不是被其它进程占用
nvidia-smi
Tue Nov 23 13:17:53 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.00 Driver Version: 470.82.00 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 30% 36C P0 43W / 235W | 0MiB / 7982MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
表明没有其它进程占用,如有kill掉 然后解绑nvidia
echo -n "0000:01:00.0" >/sys/bus/pci/drivers/nvidia/unbind
绑定到vfio-pci驱动
echo -n "0000:01:00.0" >/sys/bus/pci/drivers/vfio-pci/bind
把设备01:00.0的driver改为vfio-pci
echo "vfio-pci" >/sys/bus/pci/devices/0000\:01\:00.0/driver_override
验证是否成功:
lspci -nnv -s 01:00.0
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU104 [GeForce RTX 2070 SUPER] [10de:1e84] (rev a1) (prog-if 00 [VGA controller])
Subsystem: Gigabyte Technology Co., Ltd TU104 [GeForce RTX 2070 SUPER] [1458:4001]
Flags: fast devsel, IRQ 16
Memory at a4000000 (32-bit, non-prefetchable) [size=16M]
Memory at 90000000 (64-bit, prefetchable) [size=256M]
Memory at a0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 5000 [size=128]
Expansion ROM at a5000000 [virtual] [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Capabilities: [bb0] Resizable BAR <?>
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
问题2:
vfio 0000:01:00.0: group 1 is not viable
该故障的原因是group 1上的其它设备01:00.* 没有使用vfio,解决方法通过
lspci -nnv -s 01:00.1
找到对应的模块snd_hda_intel进行解绑更换driver为vfio-pci
echo "01:00.1" >/sys/bus/pci/drivers/snd_hda_intel/unbind
echo "01:00.1" >/sys/bus/pci/drivers/vfio-pci/bind
第二步 OpenStack层面的配置
- 在/etc/kolla/config/nova.conf文件中增加如下内容:
[filter_scheduler]
enabled_filters = AvailabilityZoneFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, PciPassthroughFilter
available_filters = nova.scheduler.filters.all_filters
[pci]
alias = { "vendor_id":"10de", "product_id":"1e84", "name":"a1" }
passthrough_whitelist = { "vendor_id":"10de", "product_id":"1e84" }
参数name
自己定义
vendor_id
和product_id
来自lspci -nnv 01:00.0
的地址。
- 修改配置nova.conf配置
kolla-ansible -i all-in-one reconfigure -t nova
- 创建一个带有元数据
pci_passthrough:alias='a1:1'
的flavor
openstack flavor create --ram 2048 --disk 10 --vcpu 2 gpu
openstack flavor set gpu --property pci_passthrough:alias='a1:1'
- 启动虚拟机验证是否透传成功(用官方的centos7)
进入虚拟机通过lspci
命令查看