| Intel Graphics Device (IGD) assignment with vfio-pci |
| ==================================================== |
| |
| Using vfio-pci, we can passthrough Intel Graphics Device (IGD) to guest, either |
| serve as primary and exclusive graphics adapter, or used in combination with an |
| emulated primary graphics device, depending on the config and guest driver |
| support. However, IGD devices are not "clean" PCI devices, they use extra |
| memory regions other than BARs. Special handling is required to make them work |
| properly, including: |
| |
| * OpRegion for accessing Virtual BIOS Table (VBT) that contains display output |
| information. |
| * Data Stolen Memory (DSM) region used as VRAM at early stage (BIOS/UEFI) |
| |
| Certain guest software also depends on following conditions to work: |
| (*-Required by) |
| |
| | Condition | Linux | Windows | VBIOS | EFI GOP | |
| |---------------------------------------------|-------|---------|-------|---------| |
| | #1 IGD has a valid OpRegion containing VBT | * ^1 | * | * | * | |
| | #2 VID/DID of LPC bridge at 00:1f.0 matches | | | * | * | |
| | #3 IGD is assigned to BDF 00:02.0 | | | * | * | |
| | #4 IGD has VGA controller device class | | | * | * | |
| | #5 Host's VGA ranges are mapped to IGD | | | * | | |
| | #6 Guest has valid VBIOS or UEFI Option ROM | | | * | * | |
| |
| ^1 Though i915 driver is able to mock a OpRegion, it is still recommended to |
| use the VBT copied from host OpRegion to prevent incorrect configuration. |
| |
| For #1, the "x-igd-opregion=on" option exposes a copy of host IGD OpRegion to |
| guest via fw_cfg, where guest firmware can set up guest OpRegion with it. |
| |
| For #2, "x-igd-lpc=on" option copies the IDs of host LPC bridge and host bridge |
| to guest. Currently this is only supported on i440fx machines as there is |
| already an ICH9 LPC bridge present on q35 machines, overwriting its IDs may |
| lead to unexpected behavior. |
| |
| For #3, "addr=2.0" assigns IGD to 00:02.0. |
| |
| For #4, the primary display must be set to IGD in host BIOS. |
| |
| For #5, "x-vga=on" enables guest access to standard VGA IO/MMIO ranges. |
| |
| For #6, ROM either provided via the ROM BAR or romfile= option is needed, this |
| Intel document [1] shows how to dump VBIOS to file. For UEFI Option ROM, see |
| "Guest firmware" section. |
| |
| QEMU also provides a "Legacy" mode that implicitly enables full functionality |
| on IGD, it is automatically enabled when |
| * IGD generation is 6 to 9 (Sandy Bridge to Comet Lake) |
| * Machine type is i440fx |
| * IGD is assigned to guest BDF 00:02.0 |
| * ROM BAR or romfile is present |
| |
| In "Legacy" mode, QEMU will automatically setup OpRegion, LPC bridge IDs and |
| VGA range access, which is equivalent to: |
| x-igd-opregion=on,x-igd-lpc=on,x-vga=on |
| |
| By default, "Legacy" mode won't fail, it continues on error. User can set |
| "x-igd-legacy-mode=on" to force enabling legacy mode, this also checks if the |
| conditions above for legacy mode is met, and if any error occurs, QEMU will |
| fail immediately. Users can also set "x-igd-legacy-mode=off" to disable legacy |
| mode. |
| |
| In legacy mode, as the guest VGA ranges are assigned to IGD device, all other |
| graphics devices should be removed, this can be done using "-nographic" or |
| "-vga none" or "-nodefaults", along with adding the device using vfio-pci. |
| |
| For either mode, depending on the host kernel, the i915 driver in the host |
| may generate faults and errors upon re-binding to an IGD device after it |
| has been assigned to a VM. It's therefore generally recommended to prevent |
| such driver binding unless the host driver is known to work well for this. |
| There are numerous ways to do this, i915 can be blacklisted on the host, |
| the driver_override option can be used to ensure that only vfio-pci can bind |
| to the device on the host[2], virsh nodedev-detach can be used to bind the |
| device to vfio drivers and then managed='no' set in the VM xml to prevent |
| re-binding to i915, etc. Also note that IGD is also typically the primary |
| graphics in the host and special options may be required beyond simply |
| blacklisting i915 or using pci-stub/vfio-pci to take ownership of IGD as a |
| PCI class device. Lower level drivers exist that may still claim the device. |
| It may therefore be necessary to use kernel boot options video=vesafb:off or |
| video=efifb:off (depending on host BIOS/UEFI) or these can be combined to |
| a catch-all, video=vesafb:off,efifb:off. Error messages such as: |
| |
| Failed to mmap 0000:00:02.0 BAR <>. Performance may be slow |
| |
| are a good indicator that such a problem exists. The host files /proc/iomem |
| and /proc/ioports are often useful for identifying drivers consuming ranges |
| of the device to cause such conflicts. |
| |
| Additionally, IGD device are known to generate small numbers of DMAR faults |
| when initially assigned. It is believed that this is simply the IGD attempting |
| to access the reserved GTT space after reset, which it no longer has access to |
| when accessed from userspace. So long as the DMAR faults are small in number |
| and most importantly, not ongoing, these are not an indication of an error. |
| |
| Additionally++, analog VGA output (as opposed to digital outputs like HDMI, |
| DVI, or DisplayPort) may be unsupported in some use cases. In the author's |
| experience, even DP to VGA adapters can be troublesome while adapters between |
| digital formats work well. |
| |
| |
| Options |
| ======= |
| * x-igd-opregion=[*on*|off] |
| Copy host IGD OpRegion and expose it to guest with fw_cfg |
| |
| * x-igd-lpc=[on|*off*] |
| Creates a dummy LPC bridge at 00:1f:0 with host VID/DID (i440fx only) |
| |
| * x-igd-legacy-mode=[on|off|*auto*] |
| Enable/Disable legacy mode |
| |
| * x-igd-gms=[hex, default 0] |
| Overriding DSM region size in GGC register, 0 means uses host value. |
| Use this only when the DSM size cannot be changed through the |
| 'DVMT Pre-Allocated' option in host BIOS. |
| |
| |
| Examples |
| ======== |
| * Adding IGD with automatically legacy mode support |
| -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0 |
| |
| * Adding IGD with OpRegion and LPC ID hack, but without VGA ranges |
| (For UEFI guests) |
| -device vfio-pci,host=00:02.0,id=hostdev0,addr=2.0,x-igd-legacy-mode=off,x-igd-lpc=on,romfile=efi_oprom.rom |
| |
| |
| Guest firmware |
| ============== |
| Guest firmware is responsible for setting up OpRegion and Base of Data Stolen |
| Memory (BDSM) in guest address space. IGD passthrough support imposes two |
| fw_cfg requirements on the VM firmware: |
| |
| 1) "etc/igd-opregion" |
| |
| This fw_cfg file exposes the OpRegion for the IGD device. A reserved |
| region should be created below 4GB (recommended 4KB alignment), sized |
| sufficient for the fw_cfg file size, and the content of this file copied |
| to it. The dword based address of this reserved memory region must also |
| be written to the ASLS register at offset 0xFC on the IGD device. It is |
| recommended that firmware should make use of this fw_cfg entry for any |
| PCI class VGA device with Intel vendor ID. Multiple of such devices |
| within a VM is undefined. |
| |
| 2) "etc/igd-bdsm-size" |
| |
| This fw_cfg file contains an 8-byte, little endian integer indicating |
| the size of the reserved memory region required for IGD stolen memory. |
| Firmware must allocate a reserved memory below 4GB with required 1MB |
| alignment equal to this size. Additionally the base address of this |
| reserved region must be written to the dword BDSM register in PCI config |
| space of the IGD device at offset 0x5C (or 0xC0 for Gen 11+ devices using |
| 64-bit BDSM). As this support is related to running the IGD ROM, which |
| has other dependencies on the device appearing at guest address 00:02.0, |
| it's expected that this fw_cfg file is only relevant to a single PCI |
| class VGA device with Intel vendor ID, appearing at PCI bus address 00:02.0. |
| |
| Starting from Meteor Lake, IGD devices access stolen memory via its MMIO |
| BAR2 (LMEMBAR) and removed the BDSM register in config space. There is |
| no need for guest firmware to allocate data stolen memory in guest address |
| space and write it to BDSM register. Value of this fw_cfg file is 0 in |
| such case. |
| |
| Upstream Seabios has OpRegion and BDSM (pre-Gen11 device only) support. |
| However, the support is not accepted by upstream EDK2/OVMF. A recommended |
| solution is to create a virtual OpRom with following DXE drivers: |
| |
| * IgdAssignmentDxe: Set up OpRegion and BDSM according to fw_cfg (must) |
| * IntelGopDriver: Closed-source Intel GOP driver |
| * PlatformGopPolicy: Protocol required by IntelGopDriver |
| |
| IntelGopDriver and PlatformGopPolicy is only required when enabling GOP on IGD. |
| |
| The original IgdAssignmentDxe can be found at [3]. A Intel maintained version |
| with PlatformGopPolicy for industrial computing is at [4]. There is also an |
| unofficially maintained version with newer Gen11+ device support at [5]. |
| You need to build them with EDK2. |
| |
| For the IntelGopDriver, Intel never released it to public. You may contact |
| Intel support to get one as [4] said, if you are an Intel Premier Support |
| customer, or you can try extracting it from your host firmware using |
| "UEFI BIOS Updater"[6]. |
| |
| Once you got all the required DXE drivers, a Option ROM can be generated with |
| EfiRom utility in EDK2, using |
| EfiRom -f 0x8086 -i <Device ID of your IGD> -o output.rom \ |
| -e IgdAssignmentDxe.efi PlatformGOPPolicy.efi IntelGopDriver.efi |
| |
| |
| Known issues |
| ============ |
| When using OVMF as guest firmware, you may encounter the following warning: |
| warning: vfio_container_dma_map(0x55fab36ce610, 0x380010000000, 0x108000, 0x7fd336000000) = -22 (Invalid argument) |
| |
| Solution: |
| Set the host physical address bits to IOMMU address width using |
| -cpu host,host-phys-bits-limit=<IOMMU address width> |
| Or in libvirt XML with |
| <cpu> |
| <maxphysaddr mode='passthrough' limit='<IOMMU address width>'/> |
| </cpu> |
| The IOMMU address width can be determined with |
| echo $(( ((0x$(cat /sys/devices/virtual/iommu/dmar0/intel-iommu/cap) & 0x3F0000) >> 16) + 1 )) |
| Refer https://edk2.groups.io/g/devel/topic/patch_v1/102359124 for more details |
| |
| |
| Memory View |
| =========== |
| IGD has it own address space. To use system RAM as VRAM, a single-level page |
| table named Global Graphics Translation Table (GTT) is used for the address |
| translation. Each page table entry points a 4KB page. Illustration below shows |
| the translation flow on IGD with 64-bit GTT PTEs. |
| |
| (PTE_SIZE == 8) +-------------+---+ |
| | Address | V | V: Valid Bit |
| +-------------+---+ |
| | ... | | |
| IGD:0x01ae9010 0xd740| 0x70ffc000 | 1 | Mem:0x42ba3e010^ |
| -----------------------> 0xd748| 0x42ba3e000 | 1 +------------------> |
| (addr >> 12) * PTE_SIZE 0xd750| 0x42ba3f000 | 1 | |
| | ... | | |
| +-------------+---+ |
| ^ The address may be remapped by IOMMU |
| |
| The memory region store GTT is called GTT Stolen Memory (GSM) it is located |
| right below the Data Stolen Memory (DSM). Accessing this region directly is |
| not allowed, any access will immediately freeze the whole system. The only way |
| to access it is through the second half of MMIO BAR0. |
| |
| The Data Stolen Memory is reserved by firmware, and acts as the VRAM in pre-OS |
| environments. In QEMU, guest firmware (Seabios/OVMF) is responsible for |
| reserving a continuous region and program its base address to BDSM register, |
| then let VBIOS/GOP driver initializing this region. Illustration below shows |
| how DSM is mapped. |
| |
| IGD Addr Space Host Addr Space Guest Addr Space |
| +-------------+ +-------------+ +-------------+ |
| | | | | | | |
| | | | | | | |
| | | +-------------+ +-------------+ |
| | | | Data Stolen | | Data Stolen | |
| | | | (Guest) | | (Guest) | |
| | | +------------>+-------------+<------->+-------------+<--Guest BDSM |
| | | | Passthrough | | EPT | | Emulated by QEMU |
| DSMSIZE+-------------+ | with IOMMU | | Mapping | | Programmed by guest FW |
| | | | | | | | |
| | | | | | | | |
| 0+-------------+--+ | | | | |
| | +-------------+ | | |
| | | Data Stolen | +-------------+ |
| | | (Host) | |
| +------------>+-------------+<--Host BDSM |
| Non- | | "real" one in HW |
| Passthrough | | Programmed by host FW |
| +-------------+ |
| |
| Footnotes |
| ========= |
| [1] https://www.intel.com/content/www/us/en/docs/graphics-for-linux/developer-reference/1-0/dump-video-bios.html |
| [2] # echo "vfio-pci" > /sys/bus/pci/devices/0000:00:02.0/driver_override |
| [3] https://web.archive.org/web/20240827012422/https://bugzilla.tianocore.org/show_bug.cgi?id=935 |
| Tianocore bugzilla was down since Jan 2025 :( |
| [4] https://eci.intel.com/docs/3.3/components/kvm-hypervisor.html, Patch 0001-0004 |
| [5] https://github.com/tomitamoeko/VfioIgdPkg |
| [6] https://winraid.level1techs.com/t/tool-guide-news-uefi-bios-updater-ubu/30357 |