| =============================== |
| IOMMUFD BACKEND usage with VFIO |
| =============================== |
| |
| (Same meaning for backend/container/BE) |
| |
| With the introduction of iommufd, the Linux kernel provides a generic |
| interface for user space drivers to propagate their DMA mappings to kernel |
| for assigned devices. While the legacy kernel interface is group-centric, |
| the new iommufd interface is device-centric, relying on device fd and iommufd. |
| |
| To support both interfaces in the QEMU VFIO device, introduce a base container |
| to abstract the common part of VFIO legacy and iommufd container. So that the |
| generic VFIO code can use either container. |
| |
| The base container implements generic functions such as memory_listener and |
| address space management whereas the derived container implements callbacks |
| specific to either legacy or iommufd. Each container has its own way to setup |
| secure context and dma management interface. The below diagram shows how it |
| looks like with both containers. |
| |
| :: |
| |
| VFIO AddressSpace/Memory |
| +-------+ +----------+ +-----+ +-----+ |
| | pci | | platform | | ap | | ccw | |
| +---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+ |
| | | | | | AddressSpace | |
| | | | | +------------+---------+ |
| +---V-----------V-----------V--------V----+ / |
| | VFIOAddressSpace | <------------+ |
| | | | MemoryListener |
| | VFIOContainerBase list | |
| +-------+----------------------------+----+ |
| | | |
| | | |
| +-------V------+ +--------V----------+ |
| | iommufd | | vfio legacy | |
| | container | | container | |
| +-------+------+ +--------+----------+ |
| | | |
| | /dev/iommu | /dev/vfio/vfio |
| | /dev/vfio/devices/vfioX | /dev/vfio/$group_id |
| Userspace | | |
| ============+============================+=========================== |
| Kernel | device fd | |
| +---------------+ | group/container fd |
| | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU) |
| | ATTACH_IOAS) | | device fd |
| | | | |
| | +-------V------------V-----------------+ |
| iommufd | | vfio | |
| (map/unmap | +---------+--------------------+-------+ |
| ioas_copy) | | | map/unmap |
| | | | |
| +------V------+ +-----V------+ +------V--------+ |
| | iommfd core | | device | | vfio iommu | |
| +-------------+ +------------+ +---------------+ |
| |
| * Secure Context setup |
| |
| - iommufd BE: uses device fd and iommufd to setup secure context |
| (bind_iommufd, attach_ioas) |
| - vfio legacy BE: uses group fd and container fd to setup secure context |
| (set_container, set_iommu) |
| |
| * Device access |
| |
| - iommufd BE: device fd is opened through ``/dev/vfio/devices/vfioX`` |
| - vfio legacy BE: device fd is retrieved from group fd ioctl |
| |
| * DMA Mapping flow |
| |
| 1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener |
| 2. VFIO populates DMA map/unmap via the container BEs |
| * iommufd BE: uses iommufd |
| * vfio legacy BE: uses container fd |
| |
| Example configuration |
| ===================== |
| |
| Step 1: configure the host device |
| --------------------------------- |
| |
| It's exactly same as the VFIO device with legacy VFIO container. |
| |
| Step 2: configure QEMU |
| ---------------------- |
| |
| Interactions with the ``/dev/iommu`` are abstracted by a new iommufd |
| object (compiled in with the ``CONFIG_IOMMUFD`` option). |
| |
| Any QEMU device (e.g. VFIO device) wishing to use ``/dev/iommu`` must |
| be linked with an iommufd object. It gets a new optional property |
| named iommufd which allows to pass an iommufd object. Take ``vfio-pci`` |
| device for example: |
| |
| .. code-block:: bash |
| |
| -object iommufd,id=iommufd0 |
| -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 |
| |
| Note the ``/dev/iommu`` and VFIO cdev can be externally opened by a |
| management layer. In such a case the fd is passed, the fd supports a |
| string naming the fd or a number, for example: |
| |
| .. code-block:: bash |
| |
| -object iommufd,id=iommufd0,fd=22 |
| -device vfio-pci,iommufd=iommufd0,fd=23 |
| |
| If the ``fd`` property is not passed, the fd is opened by QEMU. |
| |
| If no ``iommufd`` object is passed to the ``vfio-pci`` device, iommufd |
| is not used and the user gets the behavior based on the legacy VFIO |
| container: |
| |
| .. code-block:: bash |
| |
| -device vfio-pci,host=0000:02:00.0 |
| |
| Supported platform |
| ================== |
| |
| Supports x86, ARM and s390x currently. |
| |
| Caveats |
| ======= |
| |
| Dirty page sync |
| --------------- |
| |
| Dirty page sync with iommufd backend is unsupported yet, live migration is |
| disabled by default. But it can be force enabled like below, low efficient |
| though. |
| |
| .. code-block:: bash |
| |
| -object iommufd,id=iommufd0 |
| -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0,enable-migration=on |
| |
| P2P DMA |
| ------- |
| |
| PCI p2p DMA is unsupported as IOMMUFD doesn't support mapping hardware PCI |
| BAR region yet. Below warning shows for assigned PCI device, it's not a bug. |
| |
| .. code-block:: none |
| |
| qemu-system-x86_64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR? |
| qemu-system-x86_64: vfio_container_dma_map(0x560cb6cb1620, 0xe000000021000, 0x3000, 0x7f32ed55c000) = -14 (Bad address) |
| |
| FD passing with mdev |
| -------------------- |
| |
| ``vfio-pci`` device checks sysfsdev property to decide if backend is a mdev. |
| If FD passing is used, there is no way to know that and the mdev is treated |
| like a real PCI device. There is an error as below if user wants to enable |
| RAM discarding for mdev. |
| |
| .. code-block:: none |
| |
| qemu-system-x86_64: -device vfio-pci,iommufd=iommufd0,x-balloon-allowed=on,fd=9: vfio VFIO_FD9: x-balloon-allowed only potentially compatible with mdev devices |
| |
| ``vfio-ap`` and ``vfio-ccw`` devices don't have same issue as their backend |
| devices are always mdev and RAM discarding is force enabled. |