Computers

GNU/Linux Ubuntu 20.04 - GPU pass-through AMD Radeon RX5700 - Qemu - KVM - VFIO

Post added on Nov. 16, 2020

GNU/Linux Ubuntu 20.04 - GPU pass-through AMD Radeon RX5700 - Qemu - KVM - VFIO

My setup:

  • Gigabyte Aorus X570 Master - Flashed with BIOS version F30
  • AMD Ryzen R9 3950X
  • Asrock Radeon RX5700 (reference design) for my host machine
  • Asus Radeon RX5700 (reference design) for my VM guest

BIOS setup

  • CSM disable
  • UEFI boot enable

Host machine OS

  • GNU/Linux Xubuntu 20.04.1

Installation of the necessary software

sudo apt-get install bridge-utils ovmf libvirt-clients libvirt-daemon-system qemu-kvm qemu-utils virt-manager uml-utilities vim git build-essential linux-headers-`uname -r` libncurses-dev flex bison openssl libssl-dev dkms libelf-dev libudev-dev libpci-dev libiberty-dev autoconf kernel-package fakeroot libncurses5-dev

Modification of GRUB configuration

sudo vim /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash acpi_enforce_resources=lax amd_iommu=on iommu=pt pcie_acs_override=downstream vfio_iommu_type1.allow_unsafe_interrupts=1 kvm.ignore_msrs=1 rd.driver.pre=vfio-pci amdgpu.ppfeaturemask=0xffffffff"

Isolation of the GPU pass-through device

In my case, i needed to isolate a couple of devices. You will need to run that command line in order to select your own devices:

for iommu_group in $(find /sys/kernel/iommu_groups/ -maxdepth 1 -mindepth 1 -type d); do echo "IOMMU group $(basename "$iommu_group")"; for device in $(ls -1 "$iommu_group"/devices/); do echo -n $'\t'; lspci -nns "$device"; done; done

Guest GPU and its HDMI audio device

IOMMU group 38
13:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c4)
IOMMU group 39
13:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]

PCIE NVME SSD

IOMMU group 26
04:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]

Sound card

IOMMU group 31
09:00.0 PCI bridge [0604]: Tundra Semiconductor Corp. Tsi381 PCIe to PCI Bridge [10e3:8111] (rev 02)
0a:00.0 Multimedia audio controller [0401]: Creative Labs CA0108/CA10300 [Sound Blaster Audigy Series] [1102:0008]

USB controller

IOMMU group 43
15:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]

In order to isolate the devices, you will need to create the following script:

sudo vim /etc/initramfs-tools/scripts/init-top/bind_vfio.sh

#!/bin/sh

PREREQ=""

prereqs()
{
    echo "$PREREQ"
}

case $1 in
prereqs)
    prereqs
    exit 0
    ;;
esac

DEVS="0000:0a:00.0 0000:04:00.0 0000:13:00.0 0000:13:00.1 0000:15:00.3"
for DEV in $DEVS;
do
    echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
    echo "$DEV" > /sys/bus/pci/drivers/vfio-pci/bind
done

exit 0

Make sure to execute the following command lines:

sudo chmod 755 /etc/initramfs-tools/scripts/init-top/bind_vfio.sh
sudo chown root:root /etc/initramfs-tools/scripts/init-top/bind_vfio.sh
sudo update-grub
sudo update-initramfs -u

Reboot the computer and you should normally have successfully isolated all your devices. Run the following command-line:

lspci -nnv

04:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808] (prog-if 02 [NVM Express])
    Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a801]
    Flags: fast devsel, IRQ 41
    Memory at fc700000 (64-bit, non-prefetchable) [size=16K]
    Capabilities:
    Kernel driver in use: vfio-pci
    Kernel modules: nvme

All your selected devices, should now have "Kernel driver in use: vfio-pci".

Unfortunately, first generation of NAVI GPUs (RX5000 series) has a FLR bug, causing a serious reset bug when rebooting/shutting-down a Virtual Machine. Hopefully, two workarounds are available.

Old method, less stable in my case: Patch your current GNU/Linux kernel

Create the following file:

vim ~/linux-fix_navi_reset.patch

Copy into this file, the following content from this forum: https://forum.level1techs.com/t/navi-reset-bug-kernel-patch-v2/163103/9.

Select the branch of the current kernel installed on the host machine:

git clone https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal

Execute the following command-lines:

cd focal
patch -p1 < ~/linux-fix_navi_reset.patch

Proceed with the kernel compilation:

cp /boot/config-`uname -r` .config
yes '' | make oldconfig

If you need to perform some modifications in the kernel, execute the optional following command-line. Otherwise, skip it.

make menuconfig

We are now in the endgame:

make clean
make -j `getconf _NPROCESSORS_ONLN` deb-pkg LOCALVERSION=-custom
sudo dpkg -i ../*.deb
sudo update-grub

Reboot the computer in order to use the newly installed patched GNU/Linux kernel.

New method, more stable in my case: Vendor Reset module

This method has proven to be more stable in my situation than the kernel patch. More information about the Vendor reset module can be found here: https://github.com/gnif/vendor-reset

Execute the following command-lines:

git clone https://github.com/gnif/vendor-reset.git
cd vendor-reset
dkms install .

Edit the file /etc/modules and add the following line:

vendor-reset

Do not forget to update your initrd with the following command-line:

sudo update-initramfs -u -k all

Reboot the computer in order to use the newly installed module.

Video card driver virtualization detection

Make sure to follow the important recommendations from this website: https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Video_card_driver_virtualisation_detection.

Congratulations, you have now configured your AMD Radeon RX5700 for GPU pass-through.

About the author

Cédric CRISPIN Cédric CRISPIN (Super administrator)
Joined on : May 20, 2019
Number of articles published : 134
Number of scale models published : 0
Number of files published : 93

Share your own experience with FPV!

Sign up