Frist, you need an Intel host support Vt-d, and enabled nested virtualization. By default, nested virtualization is disabled:

[root@kvm-hypervisor ~]# cat /sys/module/kvm_intel/parameters/nested
N
  1. To enable nested virtualization:
[root@kvm-hypervisor ~]# vi /etc/modprobe.d/kvm-nested.conf
options kvm-intel nested=1
options kvm-intel enable_shadow_vmcs=1
options kvm-intel enable_apicv=1
options kvm-intel ept=1

Save & exit the file

[root@kvm-hypervisor ~]# modprobe -r kvm_intel
[root@kvm-hypervisor ~]# modprobe -a kvm_intel

Now verify whether nested virtualization feature enabled or not.

[root@kvm-hypervisor ~]# cat /sys/module/kvm_intel/parameters/nested
Y
  1. host should enable Vt-d and "iommu=pt intel_iommu=on" in kernel cmdline.

  2. To enabled L2 guest to use the PCI passthrough, need to configure the L1 guest as below:

<domain>
......
<os>
    <type arch='x86_64' machine='pc-q35-rhelxxx'>hvm</type>
   ......
  </os>
  <features>
......
    <ioapic driver='qemu'/>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <feature policy='require' name='vmx'/>
  </cpu>
	......
	<devices>
	......
	<iommu model='intel'>
      <driver intremap='on' caching_mode='on' iotlb='on'/>
    </iommu>
		...
	  <controller type='pci' index='8' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='254'>
        <node>1</node>
      </target>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <controller type='pci' index='9' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='9' port='0x17'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </controller>
	<interface type='hostdev' managed='yes'>
      <mac address='52:54:00:19:78:c6'/>
      <driver name='vfio'/>
      <source>
        <address type='pci' domain='0x0000' bus='0x82' slot='0x0a' function='0x3'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
    </interface>
		...
  </devices>
</domain>
  • vmx is need for nested virtualization(L1 guest should use 'host-model' or 'host-passthrough' cpu or have the vmx as required);
  • The guest vIOMMU is a general device in QEMU. Currently only Q35 platform supports guest vIOMMU;
  • intremap=[on|off] shows whether the guest vIOMMU will support interrupt remapping. To fully enable vIOMMU functionality, we need to provide intremap=on here. Currently, interrupt remapping does not support full kernel irqchip, only "split" and "off" are supported, It depends on <ioapic driver='qemu'/>;
  • Most of the full emulated devices (like e1000 mentioned above) should be able to work seamlessly now with Intel vIOMMU. However there are some special devices that need extra cares. These devices are:  Assigned devices (like, vfio-pci)  Virtio devices (like, virtio-net-pci)
  • caching-mode=on is required when we have assigned devices with the intel-iommu device. The above example assigned the host PCI device 02:00.0 to the guest;
  • They will make qemu cmdline like this: ......kernel_irqchip=split .... -device intel-iommu,intremap=on,caching-mode=on
  • virtio devices need "iommu_platform=on,ats=on" defined in device like memballoon device as above. And "device-iotlb=on" in the iommu device;
  1. And on L1 guest, enable "iommu=pt intel_iommu=on" in kernel cmdline.
# vim /etc/default/grub  (apend "intel_iommu=on" to GRUB_CMDLINE_LINUX)

if you use seabios:

# grub2-mkconfig -o /boot/grub2//grub.cfg

if you use OVMF:

# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

Reboot the L1 guest, then on L1 guest check if the env is ok: 1). the kvm device is there, otherwise, check the 'enable nested virtualization' step(step 1)

# ls -al /dev/kvm
crw-rw-rw-. 1 root kvm 10, 232 Jul  3 14:30 /dev/kvm
# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                10
On-line CPU(s) list:   0-9
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             10
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
Stepping:              2
CPU MHz:               2397.222
BogoMIPS:              4794.44
Virtualization:        VT-x
Hypervisor vendor:     KVM
Virtualization type:   full
......

2). Checkpoint for vIOMMU enable

# dmesg  | grep -i DMAR
[    0.000000] ACPI: DMAR 0x000000007FFE2541 000048 (v01 BOCHS  BXPCDMAR 00000001 BXPC 00000001)
[    0.000000] DMAR: IOMMU enabled
[    0.203737] DMAR: Host address width 39
[    0.203739] DMAR: DRHD base: 0x000000fed90000 flags: 0x1
[    0.203776] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 12008c22260206 ecap f02
[    2.910862] DMAR: No RMRR found
[    2.910863] DMAR: No ATSR found
[    2.914870] DMAR: dmar0: Using Queued invalidation
[    2.914924] DMAR: Setting RMRR:
[    2.914926] DMAR: Prepare 0-16MiB unity mapping for LPC
[    2.915039] DMAR: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[    2.915140] DMAR: Intel(R) Virtualization Technology for Directed I/O

Make sure the "DMAR: Intel(R) Virtualization Technology for Directed I/O" is there – if that’s missing something went wrong – don’t be mislead by the earlier “DMAR: IOMMU enabled” line which merely says the kernel saw the “intel_iommu=on” command line option.

3). The IOMMU should also have registered the PCI devices into various groups

# dmesg  | grep -i iommu  |grep device
[    2.915212] iommu: Adding device 0000:00:00.0 to group 0
[    2.915226] iommu: Adding device 0000:00:01.0 to group 1
...snip...
[    5.588723] iommu: Adding device 0000:b5:00.0 to group 14
[    5.588737] iommu: Adding device 0000:b6:00.0 to group 15
[    5.588751] iommu: Adding device 0000:b7:00.0 to group 16

Now you can assgin the 3 interfaces to L2 guest.

Above steps expected to works well, but in fact, some devices share the same iommu group. How to make the devices into separated iommu group?

Reference: https://www.linuxtechi.com/enable-nested-virtualization-kvm-centos-7-rhel-7/ https://www.linux-kvm.org/page/Nested_Guests https://www.redhat.com/en/blog/inception-how-usable-are-nested-kvm-guests https://www.berrange.com/posts/2017/02/16/setting-up-a-nested-kvm-guest-for-developing-testing-pci-device-assignment-with-numa/ https://wiki.qemu.org/Features/VT-d