Computers
GNU/Linux Ubuntu 20.04 - GPU pass-through AMD Radeon RX5700 - Qemu - KVM - VFIO
Post added on Nov. 16, 2020
My setup:
- Gigabyte Aorus X570 Master - Flashed with BIOS version F30
- AMD Ryzen R9 3950X
- Asrock Radeon RX5700 (reference design) for my host machine
- Asus Radeon RX5700 (reference design) for my VM guest
BIOS setup
- CSM disable
- UEFI boot enable
Host machine OS
- GNU/Linux Xubuntu 20.04.1
Installation of the necessary software
sudo apt-get install bridge-utils ovmf libvirt-clients libvirt-daemon-system qemu-kvm qemu-utils virt-manager uml-utilities vim git build-essential linux-headers-`uname -r` libncurses-dev flex bison openssl libssl-dev dkms libelf-dev libudev-dev libpci-dev libiberty-dev autoconf kernel-package fakeroot libncurses5-dev
Modification of GRUB configuration
sudo vim /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash acpi_enforce_resources=lax amd_iommu=on iommu=pt pcie_acs_override=downstream vfio_iommu_type1.allow_unsafe_interrupts=1 kvm.ignore_msrs=1 rd.driver.pre=vfio-pci amdgpu.ppfeaturemask=0xffffffff"
Isolation of the GPU pass-through device
In my case, i needed to isolate a couple of devices. You will need to run that command line in order to select your own devices:
for iommu_group in $(find /sys/kernel/iommu_groups/ -maxdepth 1 -mindepth 1 -type d); do echo "IOMMU group $(basename "$iommu_group")"; for device in $(ls -1 "$iommu_group"/devices/); do echo -n $'\t'; lspci -nns "$device"; done; done
Guest GPU and its HDMI audio device
IOMMU group 38
13:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c4)
IOMMU group 39
13:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]
PCIE NVME SSD
IOMMU group 26
04:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
Sound card
IOMMU group 31
09:00.0 PCI bridge [0604]: Tundra Semiconductor Corp. Tsi381 PCIe to PCI Bridge [10e3:8111] (rev 02)
0a:00.0 Multimedia audio controller [0401]: Creative Labs CA0108/CA10300 [Sound Blaster Audigy Series] [1102:0008]
USB controller
IOMMU group 43
15:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
In order to isolate the devices, you will need to create the following script:
sudo vim /etc/initramfs-tools/scripts/init-top/bind_vfio.sh
#!/bin/sh
PREREQ=""
prereqs()
{
echo "$PREREQ"
}
case $1 in
prereqs)
prereqs
exit 0
;;
esac
DEVS="0000:0a:00.0 0000:04:00.0 0000:13:00.0 0000:13:00.1 0000:15:00.3"
for DEV in $DEVS;
do
echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
echo "$DEV" > /sys/bus/pci/drivers/vfio-pci/bind
done
exit 0
Make sure to execute the following command lines:
sudo chmod 755 /etc/initramfs-tools/scripts/init-top/bind_vfio.sh
sudo chown root:root /etc/initramfs-tools/scripts/init-top/bind_vfio.sh
sudo update-grub
sudo update-initramfs -u
Reboot the computer and you should normally have successfully isolated all your devices. Run the following command-line:
lspci -nnv
04:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808] (prog-if 02 [NVM Express])
Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a801]
Flags: fast devsel, IRQ 41
Memory at fc700000 (64-bit, non-prefetchable) [size=16K]
Capabilities:
Kernel driver in use: vfio-pci
Kernel modules: nvme
All your selected devices, should now have "Kernel driver in use: vfio-pci".
Unfortunately, first generation of NAVI GPUs (RX5000 series) has a FLR bug, causing a serious reset bug when rebooting/shutting-down a Virtual Machine. Hopefully, two workarounds are available.
Old method, less stable in my case: Patch your current GNU/Linux kernel
Create the following file:
vim ~/linux-fix_navi_reset.patch
Copy into this file, the following content from this forum: https://forum.level1techs.com/t/navi-reset-bug-kernel-patch-v2/163103/9.
Select the branch of the current kernel installed on the host machine:
git clone https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal
Execute the following command-lines:
cd focal
patch -p1 < ~/linux-fix_navi_reset.patch
Proceed with the kernel compilation:
cp /boot/config-`uname -r` .config
yes '' | make oldconfig
If you need to perform some modifications in the kernel, execute the optional following command-line. Otherwise, skip it.
make menuconfig
We are now in the endgame:
make clean
make -j `getconf _NPROCESSORS_ONLN` deb-pkg LOCALVERSION=-custom
sudo dpkg -i ../*.deb
sudo update-grub
Reboot the computer in order to use the newly installed patched GNU/Linux kernel.
New method, more stable in my case: Vendor Reset module
This method has proven to be more stable in my situation than the kernel patch. More information about the Vendor reset module can be found here: https://github.com/gnif/vendor-reset
Execute the following command-lines:
git clone https://github.com/gnif/vendor-reset.git
cd vendor-reset
dkms install .
Edit the file /etc/modules and add the following line:
vendor-reset
Do not forget to update your initrd with the following command-line:
sudo update-initramfs -u -k all
Reboot the computer in order to use the newly installed module.
Video card driver virtualization detection
Make sure to follow the important recommendations from this website: https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Video_card_driver_virtualisation_detection.
Congratulations, you have now configured your AMD Radeon RX5700 for GPU pass-through.
About the author
Cédric CRISPIN (Super administrator)
Joined on : May 20, 2019
Number of articles published : 158
Number of scale models published : 0
Number of files published : 125