| .. _vhost_user_proto: | 
 |  | 
 | =================== | 
 | Vhost-user Protocol | 
 | =================== | 
 |  | 
 | .. | 
 |   Copyright 2014 Virtual Open Systems Sarl. | 
 |   Copyright 2019 Intel Corporation | 
 |   Licence: This work is licensed under the terms of the GNU GPL, | 
 |            version 2 or later. See the COPYING file in the top-level | 
 |            directory. | 
 |  | 
 | .. contents:: Table of Contents | 
 |  | 
 | Introduction | 
 | ============ | 
 |  | 
 | This protocol is aiming to complement the ``ioctl`` interface used to | 
 | control the vhost implementation in the Linux kernel. It implements | 
 | the control plane needed to establish virtqueue sharing with a user | 
 | space process on the same host. It uses communication over a Unix | 
 | domain socket to share file descriptors in the ancillary data of the | 
 | message. | 
 |  | 
 | The protocol defines 2 sides of the communication, *front-end* and | 
 | *back-end*. The *front-end* is the application that shares its virtqueues, in | 
 | our case QEMU. The *back-end* is the consumer of the virtqueues. | 
 |  | 
 | In the current implementation QEMU is the *front-end*, and the *back-end* | 
 | is the external process consuming the virtio queues, for example a | 
 | software Ethernet switch running in user space, such as Snabbswitch, | 
 | or a block device back-end processing read & write to a virtual | 
 | disk. In order to facilitate interoperability between various back-end | 
 | implementations, it is recommended to follow the :ref:`Backend program | 
 | conventions <backend_conventions>`. | 
 |  | 
 | The *front-end* and *back-end* can be either a client (i.e. connecting) or | 
 | server (listening) in the socket communication. | 
 |  | 
 | Support for platforms other than Linux | 
 | -------------------------------------- | 
 |  | 
 | While vhost-user was initially developed targeting Linux, nowadays it | 
 | is supported on any platform that provides the following features: | 
 |  | 
 | - A way for requesting shared memory represented by a file descriptor | 
 |   so it can be passed over a UNIX domain socket and then mapped by the | 
 |   other process. | 
 |  | 
 | - AF_UNIX sockets with SCM_RIGHTS, so QEMU and the other process can | 
 |   exchange messages through it, including ancillary data when needed. | 
 |  | 
 | - Either eventfd or pipe/pipe2. On platforms where eventfd is not | 
 |   available, QEMU will automatically fall back to pipe2 or, as a last | 
 |   resort, pipe. Each file descriptor will be used for receiving or | 
 |   sending events by reading or writing (respectively) an 8-byte value | 
 |   to the corresponding it. The 8-value itself has no meaning and | 
 |   should not be interpreted. | 
 |  | 
 | Message Specification | 
 | ===================== | 
 |  | 
 | .. Note:: All numbers are in the machine native byte order. | 
 |  | 
 | A vhost-user message consists of 3 header fields and a payload. | 
 |  | 
 | +---------+-------+------+---------+ | 
 | | request | flags | size | payload | | 
 | +---------+-------+------+---------+ | 
 |  | 
 | Header | 
 | ------ | 
 |  | 
 | :request: 32-bit type of the request | 
 |  | 
 | :flags: 32-bit bit field | 
 |  | 
 | - Lower 2 bits are the version (currently 0x01) | 
 | - Bit 2 is the reply flag - needs to be sent on each reply from the back-end | 
 | - Bit 3 is the need_reply flag - see :ref:`REPLY_ACK <reply_ack>` for | 
 |   details. | 
 |  | 
 | :size: 32-bit size of the payload | 
 |  | 
 | Payload | 
 | ------- | 
 |  | 
 | Depending on the request type, **payload** can be: | 
 |  | 
 | A single 64-bit integer | 
 | ^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | +-----+ | 
 | | u64 | | 
 | +-----+ | 
 |  | 
 | :u64: a 64-bit unsigned integer | 
 |  | 
 | A vring state description | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | +-------+-----+ | 
 | | index | num | | 
 | +-------+-----+ | 
 |  | 
 | :index: a 32-bit index | 
 |  | 
 | :num: a 32-bit number | 
 |  | 
 | A vring descriptor index for split virtqueues | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | +-------------+---------------------+ | 
 | | vring index | index in avail ring | | 
 | +-------------+---------------------+ | 
 |  | 
 | :vring index: 32-bit index of the respective virtqueue | 
 |  | 
 | :index in avail ring: 32-bit value, of which currently only the lower 16 | 
 |   bits are used: | 
 |  | 
 |   - Bits 0–15: Index of the next *Available Ring* descriptor that the | 
 |     back-end will process.  This is a free-running index that is not | 
 |     wrapped by the ring size. | 
 |   - Bits 16–31: Reserved (set to zero) | 
 |  | 
 | Vring descriptor indices for packed virtqueues | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | +-------------+--------------------+ | 
 | | vring index | descriptor indices | | 
 | +-------------+--------------------+ | 
 |  | 
 | :vring index: 32-bit index of the respective virtqueue | 
 |  | 
 | :descriptor indices: 32-bit value: | 
 |  | 
 |   - Bits 0–14: Index of the next *Available Ring* descriptor that the | 
 |     back-end will process.  This is a free-running index that is not | 
 |     wrapped by the ring size. | 
 |   - Bit 15: Driver (Available) Ring Wrap Counter | 
 |   - Bits 16–30: Index of the entry in the *Used Ring* where the back-end | 
 |     will place the next descriptor.  This is a free-running index that | 
 |     is not wrapped by the ring size. | 
 |   - Bit 31: Device (Used) Ring Wrap Counter | 
 |  | 
 | A vring address description | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | +-------+-------+------------+------+-----------+-----+ | 
 | | index | flags | descriptor | used | available | log | | 
 | +-------+-------+------------+------+-----------+-----+ | 
 |  | 
 | :index: a 32-bit vring index | 
 |  | 
 | :flags: a 32-bit vring flags | 
 |  | 
 | :descriptor: a 64-bit ring address of the vring descriptor table | 
 |  | 
 | :used: a 64-bit ring address of the vring used ring | 
 |  | 
 | :available: a 64-bit ring address of the vring available ring | 
 |  | 
 | :log: a 64-bit guest address for logging | 
 |  | 
 | Note that a ring address is an IOVA if ``VIRTIO_F_IOMMU_PLATFORM`` has | 
 | been negotiated. Otherwise it is a user address. | 
 |  | 
 | .. _memory_region_description: | 
 |  | 
 | Memory region description | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | +---------------+------+--------------+-------------+ | 
 | | guest address | size | user address | mmap offset | | 
 | +---------------+------+--------------+-------------+ | 
 |  | 
 | :guest address: a 64-bit guest address of the region | 
 |  | 
 | :size: a 64-bit size | 
 |  | 
 | :user address: a 64-bit user address | 
 |  | 
 | :mmap offset: a 64-bit offset where region starts in the mapped memory | 
 |  | 
 | When the ``VHOST_USER_PROTOCOL_F_XEN_MMAP`` protocol feature has been | 
 | successfully negotiated, the memory region description contains two extra | 
 | fields at the end. | 
 |  | 
 | +---------------+------+--------------+-------------+----------------+-------+ | 
 | | guest address | size | user address | mmap offset | xen mmap flags | domid | | 
 | +---------------+------+--------------+-------------+----------------+-------+ | 
 |  | 
 | :xen mmap flags: a 32-bit bit field | 
 |  | 
 | - Bit 0 is set for Xen foreign memory mapping. | 
 | - Bit 1 is set for Xen grant memory mapping. | 
 | - Bit 8 is set if the memory region can not be mapped in advance, and memory | 
 |   areas within this region must be mapped / unmapped only when required by the | 
 |   back-end. The back-end shouldn't try to map the entire region at once, as the | 
 |   front-end may not allow it. The back-end should rather map only the required | 
 |   amount of memory at once and unmap it after it is used. | 
 |  | 
 | :domid: a 32-bit Xen hypervisor specific domain id. | 
 |  | 
 | Single memory region description | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | +---------+--------+ | 
 | | padding | region | | 
 | +---------+--------+ | 
 |  | 
 | :padding: 64-bit | 
 |  | 
 | :region: region is represented by :ref:`Memory region description <memory_region_description>`. | 
 |  | 
 | Multiple Memory regions description | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | +-------------+---------+---------+-----+---------+ | 
 | | num regions | padding | region0 | ... | region7 | | 
 | +-------------+---------+---------+-----+---------+ | 
 |  | 
 | :num regions: a 32-bit number of regions | 
 |  | 
 | :padding: 32-bit | 
 |  | 
 | :regions: regions field contains 8 regions of type :ref:`Memory region description <memory_region_description>`. | 
 |  | 
 | Log description | 
 | ^^^^^^^^^^^^^^^ | 
 |  | 
 | +----------+------------+ | 
 | | log size | log offset | | 
 | +----------+------------+ | 
 |  | 
 | :log size: a 64-bit size of area used for logging | 
 |  | 
 | :log offset: a 64-bit offset from start of supplied file descriptor where | 
 |              logging starts (i.e. where guest address 0 would be | 
 |              logged) | 
 |  | 
 | An IOTLB message | 
 | ^^^^^^^^^^^^^^^^ | 
 |  | 
 | +------+------+--------------+-------------------+------+ | 
 | | iova | size | user address | permissions flags | type | | 
 | +------+------+--------------+-------------------+------+ | 
 |  | 
 | :iova: a 64-bit I/O virtual address programmed by the guest | 
 |  | 
 | :size: a 64-bit size | 
 |  | 
 | :user address: a 64-bit user address | 
 |  | 
 | :permissions flags: an 8-bit value: | 
 |   - 0: No access | 
 |   - 1: Read access | 
 |   - 2: Write access | 
 |   - 3: Read/Write access | 
 |  | 
 | :type: an 8-bit IOTLB message type: | 
 |   - 1: IOTLB miss | 
 |   - 2: IOTLB update | 
 |   - 3: IOTLB invalidate | 
 |   - 4: IOTLB access fail | 
 |  | 
 | Virtio device config space | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | +--------+------+-------+---------+ | 
 | | offset | size | flags | payload | | 
 | +--------+------+-------+---------+ | 
 |  | 
 | :offset: a 32-bit offset of virtio device's configuration space | 
 |  | 
 | :size: a 32-bit configuration space access size in bytes | 
 |  | 
 | :flags: a 32-bit value: | 
 |   - 0: Vhost front-end messages used for writable fields | 
 |   - 1: Vhost front-end messages used for live migration | 
 |  | 
 | :payload: Size bytes array holding the contents of the virtio | 
 |           device's configuration space | 
 |  | 
 | Vring area description | 
 | ^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | +-----+------+--------+ | 
 | | u64 | size | offset | | 
 | +-----+------+--------+ | 
 |  | 
 | :u64: a 64-bit integer contains vring index and flags | 
 |  | 
 | :size: a 64-bit size of this area | 
 |  | 
 | :offset: a 64-bit offset of this area from the start of the | 
 |          supplied file descriptor | 
 |  | 
 | Inflight description | 
 | ^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | +-----------+-------------+------------+------------+ | 
 | | mmap size | mmap offset | num queues | queue size | | 
 | +-----------+-------------+------------+------------+ | 
 |  | 
 | :mmap size: a 64-bit size of area to track inflight I/O | 
 |  | 
 | :mmap offset: a 64-bit offset of this area from the start | 
 |               of the supplied file descriptor | 
 |  | 
 | :num queues: a 16-bit number of virtqueues | 
 |  | 
 | :queue size: a 16-bit size of virtqueues | 
 |  | 
 | VhostUserShared | 
 | ^^^^^^^^^^^^^^^ | 
 |  | 
 | +------+ | 
 | | UUID | | 
 | +------+ | 
 |  | 
 | :UUID: 16 bytes UUID, whose first three components (a 32-bit value, then | 
 |   two 16-bit values) are stored in big endian. | 
 |  | 
 | Device state transfer parameters | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | +--------------------+-----------------+ | 
 | | transfer direction | migration phase | | 
 | +--------------------+-----------------+ | 
 |  | 
 | :transfer direction: a 32-bit enum, describing the direction in which | 
 |   the state is transferred: | 
 |  | 
 |   - 0: Save: Transfer the state from the back-end to the front-end, | 
 |     which happens on the source side of migration | 
 |   - 1: Load: Transfer the state from the front-end to the back-end, | 
 |     which happens on the destination side of migration | 
 |  | 
 | :migration phase: a 32-bit enum, describing the state in which the VM | 
 |   guest and devices are: | 
 |  | 
 |   - 0: Stopped (in the period after the transfer of memory-mapped | 
 |     regions before switch-over to the destination): The VM guest is | 
 |     stopped, and the vhost-user device is suspended (see | 
 |     :ref:`Suspended device state <suspended_device_state>`). | 
 |  | 
 |   In the future, additional phases might be added e.g. to allow | 
 |   iterative migration while the device is running. | 
 |  | 
 | C structure | 
 | ----------- | 
 |  | 
 | In QEMU the vhost-user message is implemented with the following struct: | 
 |  | 
 | .. code:: c | 
 |  | 
 |   typedef struct VhostUserMsg { | 
 |       VhostUserRequest request; | 
 |       uint32_t flags; | 
 |       uint32_t size; | 
 |       union { | 
 |           uint64_t u64; | 
 |           struct vhost_vring_state state; | 
 |           struct vhost_vring_addr addr; | 
 |           VhostUserMemory memory; | 
 |           VhostUserLog log; | 
 |           struct vhost_iotlb_msg iotlb; | 
 |           VhostUserConfig config; | 
 |           VhostUserVringArea area; | 
 |           VhostUserInflight inflight; | 
 |       }; | 
 |   } QEMU_PACKED VhostUserMsg; | 
 |  | 
 | Communication | 
 | ============= | 
 |  | 
 | The protocol for vhost-user is based on the existing implementation of | 
 | vhost for the Linux Kernel. Most messages that can be sent via the | 
 | Unix domain socket implementing vhost-user have an equivalent ioctl to | 
 | the kernel implementation. | 
 |  | 
 | The communication consists of the *front-end* sending message requests and | 
 | the *back-end* sending message replies. Most of the requests don't require | 
 | replies, except for the following requests: | 
 |  | 
 | * ``VHOST_USER_GET_FEATURES`` | 
 | * ``VHOST_USER_GET_PROTOCOL_FEATURES`` | 
 | * ``VHOST_USER_GET_VRING_BASE`` | 
 | * ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) | 
 | * ``VHOST_USER_GET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) | 
 |  | 
 | .. seealso:: | 
 |  | 
 |    :ref:`REPLY_ACK <reply_ack>` | 
 |        The section on ``REPLY_ACK`` protocol extension. | 
 |  | 
 | There are several messages that the front-end sends with file descriptors passed | 
 | in the ancillary data: | 
 |  | 
 | * ``VHOST_USER_ADD_MEM_REG`` | 
 | * ``VHOST_USER_SET_MEM_TABLE`` | 
 | * ``VHOST_USER_SET_LOG_BASE`` (if ``VHOST_USER_PROTOCOL_F_LOG_SHMFD``) | 
 | * ``VHOST_USER_SET_LOG_FD`` | 
 | * ``VHOST_USER_SET_VRING_KICK`` | 
 | * ``VHOST_USER_SET_VRING_CALL`` | 
 | * ``VHOST_USER_SET_VRING_ERR`` | 
 | * ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``) | 
 | * ``VHOST_USER_SET_INFLIGHT_FD`` (if ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD``) | 
 | * ``VHOST_USER_SET_DEVICE_STATE_FD`` | 
 |  | 
 | If *front-end* is unable to send the full message or receives a wrong | 
 | reply it will close the connection. An optional reconnection mechanism | 
 | can be implemented. | 
 |  | 
 | If *back-end* detects some error such as incompatible features, it may also | 
 | close the connection. This should only happen in exceptional circumstances. | 
 |  | 
 | Any protocol extensions are gated by protocol feature bits, which | 
 | allows full backwards compatibility on both front-end and back-end.  As | 
 | older back-ends don't support negotiating protocol features, a feature | 
 | bit was dedicated for this purpose:: | 
 |  | 
 |   #define VHOST_USER_F_PROTOCOL_FEATURES 30 | 
 |  | 
 | Note that VHOST_USER_F_PROTOCOL_FEATURES is the UNUSED (30) feature | 
 | bit defined in `VIRTIO 1.1 6.3 Legacy Interface: Reserved Feature Bits | 
 | <https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-4130003>`_. | 
 | VIRTIO devices do not advertise this feature bit and therefore VIRTIO | 
 | drivers cannot negotiate it. | 
 |  | 
 | This reserved feature bit was reused by the vhost-user protocol to add | 
 | vhost-user protocol feature negotiation in a backwards compatible | 
 | fashion. Old vhost-user front-end and back-end implementations continue to | 
 | work even though they are not aware of vhost-user protocol feature | 
 | negotiation. | 
 |  | 
 | Ring states | 
 | ----------- | 
 |  | 
 | Rings have two independent states: started/stopped, and enabled/disabled. | 
 |  | 
 | * While a ring is stopped, the back-end must not process the ring at | 
 |   all, regardless of whether it is enabled or disabled.  The | 
 |   enabled/disabled state should still be tracked, though, so it can come | 
 |   into effect once the ring is started. | 
 |  | 
 | * started and disabled: The back-end must process the ring without | 
 |   causing any side effects.  For example, for a networking device, | 
 |   in the disabled state the back-end must not supply any new RX packets, | 
 |   but must process and discard any TX packets. | 
 |  | 
 | * started and enabled: The back-end must process the ring normally, i.e. | 
 |   process all requests and execute them. | 
 |  | 
 | Each ring is initialized in a stopped and disabled state.  The back-end | 
 | must start a ring upon receiving a kick (that is, detecting that file | 
 | descriptor is readable) on the descriptor specified by | 
 | ``VHOST_USER_SET_VRING_KICK`` or receiving the in-band message | 
 | ``VHOST_USER_VRING_KICK`` if negotiated, and stop a ring upon receiving | 
 | ``VHOST_USER_GET_VRING_BASE``. | 
 |  | 
 | Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``. | 
 |  | 
 | In addition, upon receiving a ``VHOST_USER_SET_FEATURES`` message from | 
 | the front-end without ``VHOST_USER_F_PROTOCOL_FEATURES`` set, the | 
 | back-end must enable all rings immediately. | 
 |  | 
 | While processing the rings (whether they are enabled or not), the back-end | 
 | must support changing some configuration aspects on the fly. | 
 |  | 
 | .. _suspended_device_state: | 
 |  | 
 | Suspended device state | 
 | ^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | While all vrings are stopped, the device is *suspended*.  In addition to | 
 | not processing any vring (because they are stopped), the device must: | 
 |  | 
 | * not write to any guest memory regions, | 
 | * not send any notifications to the guest, | 
 | * not send any messages to the front-end, | 
 | * still process and reply to messages from the front-end. | 
 |  | 
 | Multiple queue support | 
 | ---------------------- | 
 |  | 
 | Many devices have a fixed number of virtqueues.  In this case the front-end | 
 | already knows the number of available virtqueues without communicating with the | 
 | back-end. | 
 |  | 
 | Some devices do not have a fixed number of virtqueues.  Instead the maximum | 
 | number of virtqueues is chosen by the back-end.  The number can depend on host | 
 | resource availability or back-end implementation details.  Such devices are called | 
 | multiple queue devices. | 
 |  | 
 | Multiple queue support allows the back-end to advertise the maximum number of | 
 | queues.  This is treated as a protocol extension, hence the back-end has to | 
 | implement protocol features first. The multiple queues feature is supported | 
 | only when the protocol feature ``VHOST_USER_PROTOCOL_F_MQ`` (bit 0) is set. | 
 |  | 
 | The max number of queues the back-end supports can be queried with message | 
 | ``VHOST_USER_GET_QUEUE_NUM``. Front-end should stop when the number of requested | 
 | queues is bigger than that. | 
 |  | 
 | As all queues share one connection, the front-end uses a unique index for each | 
 | queue in the sent message to identify a specified queue. | 
 |  | 
 | The front-end enables queues by sending message ``VHOST_USER_SET_VRING_ENABLE``. | 
 | vhost-user-net has historically automatically enabled the first queue pair. | 
 |  | 
 | Back-ends should always implement the ``VHOST_USER_PROTOCOL_F_MQ`` protocol | 
 | feature, even for devices with a fixed number of virtqueues, since it is simple | 
 | to implement and offers a degree of introspection. | 
 |  | 
 | Front-ends must not rely on the ``VHOST_USER_PROTOCOL_F_MQ`` protocol feature for | 
 | devices with a fixed number of virtqueues.  Only true multiqueue devices | 
 | require this protocol feature. | 
 |  | 
 | Migration | 
 | --------- | 
 |  | 
 | During live migration, the front-end may need to track the modifications | 
 | the back-end makes to the memory mapped regions. The front-end should mark | 
 | the dirty pages in a log. Once it complies to this logging, it may | 
 | declare the ``VHOST_F_LOG_ALL`` vhost feature. | 
 |  | 
 | To start/stop logging of data/used ring writes, the front-end may send | 
 | messages ``VHOST_USER_SET_FEATURES`` with ``VHOST_F_LOG_ALL`` and | 
 | ``VHOST_USER_SET_VRING_ADDR`` with ``VHOST_VRING_F_LOG`` in ring's | 
 | flags set to 1/0, respectively. | 
 |  | 
 | All the modifications to memory pointed by vring "descriptor" should | 
 | be marked. Modifications to "used" vring should be marked if | 
 | ``VHOST_VRING_F_LOG`` is part of ring's flags. | 
 |  | 
 | Dirty pages are of size:: | 
 |  | 
 |   #define VHOST_LOG_PAGE 0x1000 | 
 |  | 
 | The log memory fd is provided in the ancillary data of | 
 | ``VHOST_USER_SET_LOG_BASE`` message when the back-end has | 
 | ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature. | 
 |  | 
 | The size of the log is supplied as part of ``VhostUserMsg`` which | 
 | should be large enough to cover all known guest addresses. Log starts | 
 | at the supplied offset in the supplied file descriptor.  The log | 
 | covers from address 0 to the maximum of guest regions. In pseudo-code, | 
 | to mark page at ``addr`` as dirty:: | 
 |  | 
 |   page = addr / VHOST_LOG_PAGE | 
 |   log[page / 8] |= 1 << page % 8 | 
 |  | 
 | Where ``addr`` is the guest physical address. | 
 |  | 
 | Use atomic operations, as the log may be concurrently manipulated. | 
 |  | 
 | Note that when logging modifications to the used ring (when | 
 | ``VHOST_VRING_F_LOG`` is set for this ring), ``log_guest_addr`` should | 
 | be used to calculate the log offset: the write to first byte of the | 
 | used ring is logged at this offset from log start. Also note that this | 
 | value might be outside the legal guest physical address range | 
 | (i.e. does not have to be covered by the ``VhostUserMemory`` table), but | 
 | the bit offset of the last byte of the ring must fall within the size | 
 | supplied by ``VhostUserLog``. | 
 |  | 
 | ``VHOST_USER_SET_LOG_FD`` is an optional message with an eventfd in | 
 | ancillary data, it may be used to inform the front-end that the log has | 
 | been modified. | 
 |  | 
 | Once the source has finished migration, rings will be stopped by the | 
 | source (:ref:`Suspended device state <suspended_device_state>`). No | 
 | further update must be done before rings are restarted. | 
 |  | 
 | In postcopy migration the back-end is started before all the memory has | 
 | been received from the source host, and care must be taken to avoid | 
 | accessing pages that have yet to be received.  The back-end opens a | 
 | 'userfault'-fd and registers the memory with it; this fd is then | 
 | passed back over to the front-end.  The front-end services requests on the | 
 | userfaultfd for pages that are accessed and when the page is available | 
 | it performs WAKE ioctl's on the userfaultfd to wake the stalled | 
 | back-end.  The front-end indicates support for this via the | 
 | ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` feature. | 
 |  | 
 | .. _migrating_backend_state: | 
 |  | 
 | Migrating back-end state | 
 | ^^^^^^^^^^^^^^^^^^^^^^^^ | 
 |  | 
 | Migrating device state involves transferring the state from one | 
 | back-end, called the source, to another back-end, called the | 
 | destination.  After migration, the destination transparently resumes | 
 | operation without requiring the driver to re-initialize the device at | 
 | the VIRTIO level.  If the migration fails, then the source can | 
 | transparently resume operation until another migration attempt is made. | 
 |  | 
 | Generally, the front-end is connected to a virtual machine guest (which | 
 | contains the driver), which has its own state to transfer between source | 
 | and destination, and therefore will have an implementation-specific | 
 | mechanism to do so.  The ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature | 
 | provides functionality to have the front-end include the back-end's | 
 | state in this transfer operation so the back-end does not need to | 
 | implement its own mechanism, and so the virtual machine may have its | 
 | complete state, including vhost-user devices' states, contained within a | 
 | single stream of data. | 
 |  | 
 | To do this, the back-end state is transferred from back-end to front-end | 
 | on the source side, and vice versa on the destination side.  This | 
 | transfer happens over a channel that is negotiated using the | 
 | ``VHOST_USER_SET_DEVICE_STATE_FD`` message.  This message has two | 
 | parameters: | 
 |  | 
 | * Direction of transfer: On the source, the data is saved, transferring | 
 |   it from the back-end to the front-end.  On the destination, the data | 
 |   is loaded, transferring it from the front-end to the back-end. | 
 |  | 
 | * Migration phase: Currently, the only supported phase is the period | 
 |   after the transfer of memory-mapped regions before switch-over to the | 
 |   destination, when both the source and destination devices are | 
 |   suspended (:ref:`Suspended device state <suspended_device_state>`). | 
 |   In the future, additional phases might be supported to allow iterative | 
 |   migration while the device is running. | 
 |  | 
 | The nature of the channel is implementation-defined, but it must | 
 | generally behave like a pipe: The writing end will write all the data it | 
 | has into it, signalling the end of data by closing its end.  The reading | 
 | end must read all of this data (until encountering the end of file) and | 
 | process it. | 
 |  | 
 | * When saving, the writing end is the source back-end, and the reading | 
 |   end is the source front-end.  After reading the state data from the | 
 |   channel, the source front-end must transfer it to the destination | 
 |   front-end through an implementation-defined mechanism. | 
 |  | 
 | * When loading, the writing end is the destination front-end, and the | 
 |   reading end is the destination back-end.  After reading the state data | 
 |   from the channel, the destination back-end must deserialize its | 
 |   internal state from that data and set itself up to allow the driver to | 
 |   seamlessly resume operation on the VIRTIO level. | 
 |  | 
 | Seamlessly resuming operation means that the migration must be | 
 | transparent to the guest driver, which operates on the VIRTIO level. | 
 | This driver will not perform any re-initialization steps, but continue | 
 | to use the device as if no migration had occurred.  The vhost-user | 
 | front-end, however, will re-initialize the vhost state on the | 
 | destination, following the usual protocol for establishing a connection | 
 | to a vhost-user back-end: This includes, for example, setting up memory | 
 | mappings and kick and call FDs as necessary, negotiating protocol | 
 | features, or setting the initial vring base indices (to the same value | 
 | as on the source side, so that operation can resume). | 
 |  | 
 | Both on the source and on the destination side, after the respective | 
 | front-end has seen all data transferred (when the transfer FD has been | 
 | closed), it sends the ``VHOST_USER_CHECK_DEVICE_STATE`` message to | 
 | verify that data transfer was successful in the back-end, too.  The | 
 | back-end responds once it knows whether the transfer and processing was | 
 | successful or not. | 
 |  | 
 | Memory access | 
 | ------------- | 
 |  | 
 | The front-end sends a list of vhost memory regions to the back-end using the | 
 | ``VHOST_USER_SET_MEM_TABLE`` message.  Each region has two base | 
 | addresses: a guest address and a user address. | 
 |  | 
 | Messages contain guest addresses and/or user addresses to reference locations | 
 | within the shared memory.  The mapping of these addresses works as follows. | 
 |  | 
 | User addresses map to the vhost memory region containing that user address. | 
 |  | 
 | When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has not been negotiated: | 
 |  | 
 | * Guest addresses map to the vhost memory region containing that guest | 
 |   address. | 
 |  | 
 | When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated: | 
 |  | 
 | * Guest addresses are also called I/O virtual addresses (IOVAs).  They are | 
 |   translated to user addresses via the IOTLB. | 
 |  | 
 | * The vhost memory region guest address is not used. | 
 |  | 
 | IOMMU support | 
 | ------------- | 
 |  | 
 | When the ``VIRTIO_F_IOMMU_PLATFORM`` feature has been negotiated, the | 
 | front-end sends IOTLB entries update & invalidation by sending | 
 | ``VHOST_USER_IOTLB_MSG`` requests to the back-end with a ``struct | 
 | vhost_iotlb_msg`` as payload. For update events, the ``iotlb`` payload | 
 | has to be filled with the update message type (2), the I/O virtual | 
 | address, the size, the user virtual address, and the permissions | 
 | flags. Addresses and size must be within vhost memory regions set via | 
 | the ``VHOST_USER_SET_MEM_TABLE`` request. For invalidation events, the | 
 | ``iotlb`` payload has to be filled with the invalidation message type | 
 | (3), the I/O virtual address and the size. On success, the back-end is | 
 | expected to reply with a zero payload, non-zero otherwise. | 
 |  | 
 | The back-end relies on the back-end communication channel (see :ref:`Back-end | 
 | communication <backend_communication>` section below) to send IOTLB miss | 
 | and access failure events, by sending ``VHOST_USER_BACKEND_IOTLB_MSG`` | 
 | requests to the front-end with a ``struct vhost_iotlb_msg`` as | 
 | payload. For miss events, the iotlb payload has to be filled with the | 
 | miss message type (1), the I/O virtual address and the permissions | 
 | flags. For access failure event, the iotlb payload has to be filled | 
 | with the access failure message type (4), the I/O virtual address and | 
 | the permissions flags.  For synchronization purpose, the back-end may | 
 | rely on the reply-ack feature, so the front-end may send a reply when | 
 | operation is completed if the reply-ack feature is negotiated and | 
 | back-ends requests a reply. For miss events, completed operation means | 
 | either front-end sent an update message containing the IOTLB entry | 
 | containing requested address and permission, or front-end sent nothing if | 
 | the IOTLB miss message is invalid (invalid IOVA or permission). | 
 |  | 
 | The front-end isn't expected to take the initiative to send IOTLB update | 
 | messages, as the back-end sends IOTLB miss messages for the guest virtual | 
 | memory areas it needs to access. | 
 |  | 
 | .. _backend_communication: | 
 |  | 
 | Back-end communication | 
 | ---------------------- | 
 |  | 
 | An optional communication channel is provided if the back-end declares | 
 | ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` protocol feature, to allow the | 
 | back-end to make requests to the front-end. | 
 |  | 
 | The fd is provided via ``VHOST_USER_SET_BACKEND_REQ_FD`` ancillary data. | 
 |  | 
 | A back-end may then send ``VHOST_USER_BACKEND_*`` messages to the front-end | 
 | using this fd communication channel. | 
 |  | 
 | If ``VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD`` protocol feature is | 
 | negotiated, back-end can send file descriptors (at most 8 descriptors in | 
 | each message) to front-end via ancillary data using this fd communication | 
 | channel. | 
 |  | 
 | Inflight I/O tracking | 
 | --------------------- | 
 |  | 
 | To support reconnecting after restart or crash, back-end may need to | 
 | resubmit inflight I/Os. If virtqueue is processed in order, we can | 
 | easily achieve that by getting the inflight descriptors from | 
 | descriptor table (split virtqueue) or descriptor ring (packed | 
 | virtqueue). However, it can't work when we process descriptors | 
 | out-of-order because some entries which store the information of | 
 | inflight descriptors in available ring (split virtqueue) or descriptor | 
 | ring (packed virtqueue) might be overridden by new entries. To solve | 
 | this problem, the back-end need to allocate an extra buffer to store this | 
 | information of inflight descriptors and share it with front-end for | 
 | persistent. ``VHOST_USER_GET_INFLIGHT_FD`` and | 
 | ``VHOST_USER_SET_INFLIGHT_FD`` are used to transfer this buffer | 
 | between front-end and back-end. And the format of this buffer is described | 
 | below: | 
 |  | 
 | +---------------+---------------+-----+---------------+ | 
 | | queue0 region | queue1 region | ... | queueN region | | 
 | +---------------+---------------+-----+---------------+ | 
 |  | 
 | N is the number of available virtqueues. The back-end could get it from num | 
 | queues field of ``VhostUserInflight``. | 
 |  | 
 | For split virtqueue, queue region can be implemented as: | 
 |  | 
 | .. code:: c | 
 |  | 
 |   typedef struct DescStateSplit { | 
 |       /* Indicate whether this descriptor is inflight or not. | 
 |        * Only available for head-descriptor. */ | 
 |       uint8_t inflight; | 
 |  | 
 |       /* Padding */ | 
 |       uint8_t padding[5]; | 
 |  | 
 |       /* Maintain a list for the last batch of used descriptors. | 
 |        * Only available when batching is used for submitting */ | 
 |       uint16_t next; | 
 |  | 
 |       /* Used to preserve the order of fetching available descriptors. | 
 |        * Only available for head-descriptor. */ | 
 |       uint64_t counter; | 
 |   } DescStateSplit; | 
 |  | 
 |   typedef struct QueueRegionSplit { | 
 |       /* The feature flags of this region. Now it's initialized to 0. */ | 
 |       uint64_t features; | 
 |  | 
 |       /* The version of this region. It's 1 currently. | 
 |        * Zero value indicates an uninitialized buffer */ | 
 |       uint16_t version; | 
 |  | 
 |       /* The size of DescStateSplit array. It's equal to the virtqueue size. | 
 |        * The back-end could get it from queue size field of VhostUserInflight. */ | 
 |       uint16_t desc_num; | 
 |  | 
 |       /* The head of list that track the last batch of used descriptors. */ | 
 |       uint16_t last_batch_head; | 
 |  | 
 |       /* Store the idx value of used ring */ | 
 |       uint16_t used_idx; | 
 |  | 
 |       /* Used to track the state of each descriptor in descriptor table */ | 
 |       DescStateSplit desc[]; | 
 |   } QueueRegionSplit; | 
 |  | 
 | To track inflight I/O, the queue region should be processed as follows: | 
 |  | 
 | When receiving available buffers from the driver: | 
 |  | 
 | #. Get the next available head-descriptor index from available ring, ``i`` | 
 |  | 
 | #. Set ``desc[i].counter`` to the value of global counter | 
 |  | 
 | #. Increase global counter by 1 | 
 |  | 
 | #. Set ``desc[i].inflight`` to 1 | 
 |  | 
 | When supplying used buffers to the driver: | 
 |  | 
 | 1. Get corresponding used head-descriptor index, i | 
 |  | 
 | 2. Set ``desc[i].next`` to ``last_batch_head`` | 
 |  | 
 | 3. Set ``last_batch_head`` to ``i`` | 
 |  | 
 | #. Steps 1,2,3 may be performed repeatedly if batching is possible | 
 |  | 
 | #. Increase the ``idx`` value of used ring by the size of the batch | 
 |  | 
 | #. Set the ``inflight`` field of each ``DescStateSplit`` entry in the batch to 0 | 
 |  | 
 | #. Set ``used_idx`` to the ``idx`` value of used ring | 
 |  | 
 | When reconnecting: | 
 |  | 
 | #. If the value of ``used_idx`` does not match the ``idx`` value of | 
 |    used ring (means the inflight field of ``DescStateSplit`` entries in | 
 |    last batch may be incorrect), | 
 |  | 
 |    a. Subtract the value of ``used_idx`` from the ``idx`` value of | 
 |       used ring to get last batch size of ``DescStateSplit`` entries | 
 |  | 
 |    #. Set the ``inflight`` field of each ``DescStateSplit`` entry to 0 in last batch | 
 |       list which starts from ``last_batch_head`` | 
 |  | 
 |    #. Set ``used_idx`` to the ``idx`` value of used ring | 
 |  | 
 | #. Resubmit inflight ``DescStateSplit`` entries in order of their | 
 |    counter value | 
 |  | 
 | For packed virtqueue, queue region can be implemented as: | 
 |  | 
 | .. code:: c | 
 |  | 
 |   typedef struct DescStatePacked { | 
 |       /* Indicate whether this descriptor is inflight or not. | 
 |        * Only available for head-descriptor. */ | 
 |       uint8_t inflight; | 
 |  | 
 |       /* Padding */ | 
 |       uint8_t padding; | 
 |  | 
 |       /* Link to the next free entry */ | 
 |       uint16_t next; | 
 |  | 
 |       /* Link to the last entry of descriptor list. | 
 |        * Only available for head-descriptor. */ | 
 |       uint16_t last; | 
 |  | 
 |       /* The length of descriptor list. | 
 |        * Only available for head-descriptor. */ | 
 |       uint16_t num; | 
 |  | 
 |       /* Used to preserve the order of fetching available descriptors. | 
 |        * Only available for head-descriptor. */ | 
 |       uint64_t counter; | 
 |  | 
 |       /* The buffer id */ | 
 |       uint16_t id; | 
 |  | 
 |       /* The descriptor flags */ | 
 |       uint16_t flags; | 
 |  | 
 |       /* The buffer length */ | 
 |       uint32_t len; | 
 |  | 
 |       /* The buffer address */ | 
 |       uint64_t addr; | 
 |   } DescStatePacked; | 
 |  | 
 |   typedef struct QueueRegionPacked { | 
 |       /* The feature flags of this region. Now it's initialized to 0. */ | 
 |       uint64_t features; | 
 |  | 
 |       /* The version of this region. It's 1 currently. | 
 |        * Zero value indicates an uninitialized buffer */ | 
 |       uint16_t version; | 
 |  | 
 |       /* The size of DescStatePacked array. It's equal to the virtqueue size. | 
 |        * The back-end could get it from queue size field of VhostUserInflight. */ | 
 |       uint16_t desc_num; | 
 |  | 
 |       /* The head of free DescStatePacked entry list */ | 
 |       uint16_t free_head; | 
 |  | 
 |       /* The old head of free DescStatePacked entry list */ | 
 |       uint16_t old_free_head; | 
 |  | 
 |       /* The used index of descriptor ring */ | 
 |       uint16_t used_idx; | 
 |  | 
 |       /* The old used index of descriptor ring */ | 
 |       uint16_t old_used_idx; | 
 |  | 
 |       /* Device ring wrap counter */ | 
 |       uint8_t used_wrap_counter; | 
 |  | 
 |       /* The old device ring wrap counter */ | 
 |       uint8_t old_used_wrap_counter; | 
 |  | 
 |       /* Padding */ | 
 |       uint8_t padding[7]; | 
 |  | 
 |       /* Used to track the state of each descriptor fetched from descriptor ring */ | 
 |       DescStatePacked desc[]; | 
 |   } QueueRegionPacked; | 
 |  | 
 | To track inflight I/O, the queue region should be processed as follows: | 
 |  | 
 | When receiving available buffers from the driver: | 
 |  | 
 | #. Get the next available descriptor entry from descriptor ring, ``d`` | 
 |  | 
 | #. If ``d`` is head descriptor, | 
 |  | 
 |    a. Set ``desc[old_free_head].num`` to 0 | 
 |  | 
 |    #. Set ``desc[old_free_head].counter`` to the value of global counter | 
 |  | 
 |    #. Increase global counter by 1 | 
 |  | 
 |    #. Set ``desc[old_free_head].inflight`` to 1 | 
 |  | 
 | #. If ``d`` is last descriptor, set ``desc[old_free_head].last`` to | 
 |    ``free_head`` | 
 |  | 
 | #. Increase ``desc[old_free_head].num`` by 1 | 
 |  | 
 | #. Set ``desc[free_head].addr``, ``desc[free_head].len``, | 
 |    ``desc[free_head].flags``, ``desc[free_head].id`` to ``d.addr``, | 
 |    ``d.len``, ``d.flags``, ``d.id`` | 
 |  | 
 | #. Set ``free_head`` to ``desc[free_head].next`` | 
 |  | 
 | #. If ``d`` is last descriptor, set ``old_free_head`` to ``free_head`` | 
 |  | 
 | When supplying used buffers to the driver: | 
 |  | 
 | 1. Get corresponding used head-descriptor entry from descriptor ring, | 
 |    ``d`` | 
 |  | 
 | 2. Get corresponding ``DescStatePacked`` entry, ``e`` | 
 |  | 
 | 3. Set ``desc[e.last].next`` to ``free_head`` | 
 |  | 
 | 4. Set ``free_head`` to the index of ``e`` | 
 |  | 
 | #. Steps 1,2,3,4 may be performed repeatedly if batching is possible | 
 |  | 
 | #. Increase ``used_idx`` by the size of the batch and update | 
 |    ``used_wrap_counter`` if needed | 
 |  | 
 | #. Update ``d.flags`` | 
 |  | 
 | #. Set the ``inflight`` field of each head ``DescStatePacked`` entry | 
 |    in the batch to 0 | 
 |  | 
 | #. Set ``old_free_head``,  ``old_used_idx``, ``old_used_wrap_counter`` | 
 |    to ``free_head``, ``used_idx``, ``used_wrap_counter`` | 
 |  | 
 | When reconnecting: | 
 |  | 
 | #. If ``used_idx`` does not match ``old_used_idx`` (means the | 
 |    ``inflight`` field of ``DescStatePacked`` entries in last batch may | 
 |    be incorrect), | 
 |  | 
 |    a. Get the next descriptor ring entry through ``old_used_idx``, ``d`` | 
 |  | 
 |    #. Use ``old_used_wrap_counter`` to calculate the available flags | 
 |  | 
 |    #. If ``d.flags`` is not equal to the calculated flags value (means | 
 |       back-end has submitted the buffer to guest driver before crash, so | 
 |       it has to commit the in-progress update), set ``old_free_head``, | 
 |       ``old_used_idx``, ``old_used_wrap_counter`` to ``free_head``, | 
 |       ``used_idx``, ``used_wrap_counter`` | 
 |  | 
 | #. Set ``free_head``, ``used_idx``, ``used_wrap_counter`` to | 
 |    ``old_free_head``, ``old_used_idx``, ``old_used_wrap_counter`` | 
 |    (roll back any in-progress update) | 
 |  | 
 | #. Set the ``inflight`` field of each ``DescStatePacked`` entry in | 
 |    free list to 0 | 
 |  | 
 | #. Resubmit inflight ``DescStatePacked`` entries in order of their | 
 |    counter value | 
 |  | 
 | In-band notifications | 
 | --------------------- | 
 |  | 
 | In some limited situations (e.g. for simulation) it is desirable to | 
 | have the kick, call and error (if used) signals done via in-band | 
 | messages instead of asynchronous eventfd notifications. This can be | 
 | done by negotiating the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` | 
 | protocol feature. | 
 |  | 
 | Note that due to the fact that too many messages on the sockets can | 
 | cause the sending application(s) to block, it is not advised to use | 
 | this feature unless absolutely necessary. It is also considered an | 
 | error to negotiate this feature without also negotiating | 
 | ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` and ``VHOST_USER_PROTOCOL_F_REPLY_ACK``, | 
 | the former is necessary for getting a message channel from the back-end | 
 | to the front-end, while the latter needs to be used with the in-band | 
 | notification messages to block until they are processed, both to avoid | 
 | blocking later and for proper processing (at least in the simulation | 
 | use case.) As it has no other way of signalling this error, the back-end | 
 | should close the connection as a response to a | 
 | ``VHOST_USER_SET_PROTOCOL_FEATURES`` message that sets the in-band | 
 | notifications feature flag without the other two. | 
 |  | 
 | Protocol features | 
 | ----------------- | 
 |  | 
 | .. code:: c | 
 |  | 
 |   #define VHOST_USER_PROTOCOL_F_MQ                    0 | 
 |   #define VHOST_USER_PROTOCOL_F_LOG_SHMFD             1 | 
 |   #define VHOST_USER_PROTOCOL_F_RARP                  2 | 
 |   #define VHOST_USER_PROTOCOL_F_REPLY_ACK             3 | 
 |   #define VHOST_USER_PROTOCOL_F_MTU                   4 | 
 |   #define VHOST_USER_PROTOCOL_F_BACKEND_REQ           5 | 
 |   #define VHOST_USER_PROTOCOL_F_CROSS_ENDIAN          6 | 
 |   #define VHOST_USER_PROTOCOL_F_CRYPTO_SESSION        7 | 
 |   #define VHOST_USER_PROTOCOL_F_PAGEFAULT             8 | 
 |   #define VHOST_USER_PROTOCOL_F_CONFIG                9 | 
 |   #define VHOST_USER_PROTOCOL_F_BACKEND_SEND_FD      10 | 
 |   #define VHOST_USER_PROTOCOL_F_HOST_NOTIFIER        11 | 
 |   #define VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD       12 | 
 |   #define VHOST_USER_PROTOCOL_F_RESET_DEVICE         13 | 
 |   #define VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS 14 | 
 |   #define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS  15 | 
 |   #define VHOST_USER_PROTOCOL_F_STATUS               16 | 
 |   #define VHOST_USER_PROTOCOL_F_XEN_MMAP             17 | 
 |   #define VHOST_USER_PROTOCOL_F_SHARED_OBJECT        18 | 
 |   #define VHOST_USER_PROTOCOL_F_DEVICE_STATE         19 | 
 |  | 
 | Front-end message types | 
 | ----------------------- | 
 |  | 
 | ``VHOST_USER_GET_FEATURES`` | 
 |   :id: 1 | 
 |   :equivalent ioctl: ``VHOST_GET_FEATURES`` | 
 |   :request payload: N/A | 
 |   :reply payload: ``u64`` | 
 |  | 
 |   Get from the underlying vhost implementation the features bitmask. | 
 |   Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals back-end support | 
 |   for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and | 
 |   ``VHOST_USER_SET_PROTOCOL_FEATURES``. | 
 |  | 
 | ``VHOST_USER_SET_FEATURES`` | 
 |   :id: 2 | 
 |   :equivalent ioctl: ``VHOST_SET_FEATURES`` | 
 |   :request payload: ``u64`` | 
 |   :reply payload: N/A | 
 |  | 
 |   Enable features in the underlying vhost implementation using a | 
 |   bitmask.  Feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` signals | 
 |   back-end support for ``VHOST_USER_GET_PROTOCOL_FEATURES`` and | 
 |   ``VHOST_USER_SET_PROTOCOL_FEATURES``. | 
 |  | 
 | ``VHOST_USER_GET_PROTOCOL_FEATURES`` | 
 |   :id: 15 | 
 |   :equivalent ioctl: ``VHOST_GET_FEATURES`` | 
 |   :request payload: N/A | 
 |   :reply payload: ``u64`` | 
 |  | 
 |   Get the protocol feature bitmask from the underlying vhost | 
 |   implementation.  Only legal if feature bit | 
 |   ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in | 
 |   ``VHOST_USER_GET_FEATURES``.  It does not need to be acknowledged by | 
 |   ``VHOST_USER_SET_FEATURES``. | 
 |  | 
 | .. Note:: | 
 |    Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must | 
 |    support this message even before ``VHOST_USER_SET_FEATURES`` was | 
 |    called. | 
 |  | 
 | ``VHOST_USER_SET_PROTOCOL_FEATURES`` | 
 |   :id: 16 | 
 |   :equivalent ioctl: ``VHOST_SET_FEATURES`` | 
 |   :request payload: ``u64`` | 
 |   :reply payload: N/A | 
 |  | 
 |   Enable protocol features in the underlying vhost implementation. | 
 |  | 
 |   Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is present in | 
 |   ``VHOST_USER_GET_FEATURES``.  It does not need to be acknowledged by | 
 |   ``VHOST_USER_SET_FEATURES``. | 
 |  | 
 | .. Note:: | 
 |    Back-ends that report ``VHOST_USER_F_PROTOCOL_FEATURES`` must support | 
 |    this message even before ``VHOST_USER_SET_FEATURES`` was called. | 
 |  | 
 | ``VHOST_USER_SET_OWNER`` | 
 |   :id: 3 | 
 |   :equivalent ioctl: ``VHOST_SET_OWNER`` | 
 |   :request payload: N/A | 
 |   :reply payload: N/A | 
 |  | 
 |   Issued when a new connection is established. It marks the sender | 
 |   as the front-end that owns of the session. This can be used on the *back-end* | 
 |   as a "session start" flag. | 
 |  | 
 | ``VHOST_USER_RESET_OWNER`` | 
 |   :id: 4 | 
 |   :request payload: N/A | 
 |   :reply payload: N/A | 
 |  | 
 | .. admonition:: Deprecated | 
 |  | 
 |    This is no longer used. Used to be sent to request disabling all | 
 |    rings, but some back-ends interpreted it to also discard connection | 
 |    state (this interpretation would lead to bugs).  It is recommended | 
 |    that back-ends either ignore this message, or use it to disable all | 
 |    rings. | 
 |  | 
 | ``VHOST_USER_SET_MEM_TABLE`` | 
 |   :id: 5 | 
 |   :equivalent ioctl: ``VHOST_SET_MEM_TABLE`` | 
 |   :request payload: multiple memory regions description | 
 |   :reply payload: (postcopy only) multiple memory regions description | 
 |  | 
 |   Sets the memory map regions on the back-end so it can translate the | 
 |   vring addresses. In the ancillary data there is an array of file | 
 |   descriptors for each memory mapped region. The size and ordering of | 
 |   the fds matches the number and ordering of memory regions. | 
 |  | 
 |   When ``VHOST_USER_POSTCOPY_LISTEN`` has been received, | 
 |   ``SET_MEM_TABLE`` replies with the bases of the memory mapped | 
 |   regions to the front-end.  The back-end must have mmap'd the regions but | 
 |   not yet accessed them and should not yet generate a userfault | 
 |   event. | 
 |  | 
 | .. Note:: | 
 |    ``NEED_REPLY_MASK`` is not set in this case.  QEMU will then | 
 |    reply back to the list of mappings with an empty | 
 |    ``VHOST_USER_SET_MEM_TABLE`` as an acknowledgement; only upon | 
 |    reception of this message may the guest start accessing the memory | 
 |    and generating faults. | 
 |  | 
 | ``VHOST_USER_SET_LOG_BASE`` | 
 |   :id: 6 | 
 |   :equivalent ioctl: ``VHOST_SET_LOG_BASE`` | 
 |   :request payload: u64 | 
 |   :reply payload: N/A | 
 |  | 
 |   Sets logging shared memory space. | 
 |  | 
 |   When the back-end has ``VHOST_USER_PROTOCOL_F_LOG_SHMFD`` protocol feature, | 
 |   the log memory fd is provided in the ancillary data of | 
 |   ``VHOST_USER_SET_LOG_BASE`` message, the size and offset of shared | 
 |   memory area provided in the message. | 
 |  | 
 | ``VHOST_USER_SET_LOG_FD`` | 
 |   :id: 7 | 
 |   :equivalent ioctl: ``VHOST_SET_LOG_FD`` | 
 |   :request payload: N/A | 
 |   :reply payload: N/A | 
 |  | 
 |   Sets the logging file descriptor, which is passed as ancillary data. | 
 |  | 
 | ``VHOST_USER_SET_VRING_NUM`` | 
 |   :id: 8 | 
 |   :equivalent ioctl: ``VHOST_SET_VRING_NUM`` | 
 |   :request payload: vring state description | 
 |   :reply payload: N/A | 
 |  | 
 |   Set the size of the queue. | 
 |  | 
 | ``VHOST_USER_SET_VRING_ADDR`` | 
 |   :id: 9 | 
 |   :equivalent ioctl: ``VHOST_SET_VRING_ADDR`` | 
 |   :request payload: vring address description | 
 |   :reply payload: N/A | 
 |  | 
 |   Sets the addresses of the different aspects of the vring. | 
 |  | 
 | ``VHOST_USER_SET_VRING_BASE`` | 
 |   :id: 10 | 
 |   :equivalent ioctl: ``VHOST_SET_VRING_BASE`` | 
 |   :request payload: vring descriptor index/indices | 
 |   :reply payload: N/A | 
 |  | 
 |   Sets the next index to use for descriptors in this vring: | 
 |  | 
 |   * For a split virtqueue, sets only the next descriptor index to | 
 |     process in the *Available Ring*.  The device is supposed to read the | 
 |     next index in the *Used Ring* from the respective vring structure in | 
 |     guest memory. | 
 |  | 
 |   * For a packed virtqueue, both indices are supplied, as they are not | 
 |     explicitly available in memory. | 
 |  | 
 |   Consequently, the payload type is specific to the type of virt queue | 
 |   (*a vring descriptor index for split virtqueues* vs. *vring descriptor | 
 |   indices for packed virtqueues*). | 
 |  | 
 | ``VHOST_USER_GET_VRING_BASE`` | 
 |   :id: 11 | 
 |   :equivalent ioctl: ``VHOST_USER_GET_VRING_BASE`` | 
 |   :request payload: vring state description | 
 |   :reply payload: vring descriptor index/indices | 
 |  | 
 |   Stops the vring and returns the current descriptor index or indices: | 
 |  | 
 |     * For a split virtqueue, returns only the 16-bit next descriptor | 
 |       index to process in the *Available Ring*.  Note that this may | 
 |       differ from the available ring index in the vring structure in | 
 |       memory, which points to where the driver will put new available | 
 |       descriptors.  For the *Used Ring*, the device only needs the next | 
 |       descriptor index at which to put new descriptors, which is the | 
 |       value in the vring structure in memory, so this value is not | 
 |       covered by this message. | 
 |  | 
 |     * For a packed virtqueue, neither index is explicitly available to | 
 |       read from memory, so both indices (as maintained by the device) are | 
 |       returned. | 
 |  | 
 |   Consequently, the payload type is specific to the type of virt queue | 
 |   (*a vring descriptor index for split virtqueues* vs. *vring descriptor | 
 |   indices for packed virtqueues*). | 
 |  | 
 |   When and as long as all of a device's vrings are stopped, it is | 
 |   *suspended*, see :ref:`Suspended device state | 
 |   <suspended_device_state>`. | 
 |  | 
 |   The request payload's *num* field is currently reserved and must be | 
 |   set to 0. | 
 |  | 
 | ``VHOST_USER_SET_VRING_KICK`` | 
 |   :id: 12 | 
 |   :equivalent ioctl: ``VHOST_SET_VRING_KICK`` | 
 |   :request payload: ``u64`` | 
 |   :reply payload: N/A | 
 |  | 
 |   Set the event file descriptor for adding buffers to the vring. It is | 
 |   passed in the ancillary data. | 
 |  | 
 |   Bits (0-7) of the payload contain the vring index. Bit 8 is the | 
 |   invalid FD flag. This flag is set when there is no file descriptor | 
 |   in the ancillary data. This signals that polling should be used | 
 |   instead of waiting for the kick. Note that if the protocol feature | 
 |   ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` has been negotiated | 
 |   this message isn't necessary as the ring is also started on the | 
 |   ``VHOST_USER_VRING_KICK`` message, it may however still be used to | 
 |   set an event file descriptor (which will be preferred over the | 
 |   message) or to enable polling. | 
 |  | 
 | ``VHOST_USER_SET_VRING_CALL`` | 
 |   :id: 13 | 
 |   :equivalent ioctl: ``VHOST_SET_VRING_CALL`` | 
 |   :request payload: ``u64`` | 
 |   :reply payload: N/A | 
 |  | 
 |   Set the event file descriptor to signal when buffers are used. It is | 
 |   passed in the ancillary data. | 
 |  | 
 |   Bits (0-7) of the payload contain the vring index. Bit 8 is the | 
 |   invalid FD flag. This flag is set when there is no file descriptor | 
 |   in the ancillary data. This signals that polling will be used | 
 |   instead of waiting for the call. Note that if the protocol features | 
 |   ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and | 
 |   ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` have been negotiated this message | 
 |   isn't necessary as the ``VHOST_USER_BACKEND_VRING_CALL`` message can be | 
 |   used, it may however still be used to set an event file descriptor | 
 |   or to enable polling. | 
 |  | 
 | ``VHOST_USER_SET_VRING_ERR`` | 
 |   :id: 14 | 
 |   :equivalent ioctl: ``VHOST_SET_VRING_ERR`` | 
 |   :request payload: ``u64`` | 
 |   :reply payload: N/A | 
 |  | 
 |   Set the event file descriptor to signal when error occurs. It is | 
 |   passed in the ancillary data. | 
 |  | 
 |   Bits (0-7) of the payload contain the vring index. Bit 8 is the | 
 |   invalid FD flag. This flag is set when there is no file descriptor | 
 |   in the ancillary data. Note that if the protocol features | 
 |   ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` and | 
 |   ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` have been negotiated this message | 
 |   isn't necessary as the ``VHOST_USER_BACKEND_VRING_ERR`` message can be | 
 |   used, it may however still be used to set an event file descriptor | 
 |   (which will be preferred over the message). | 
 |  | 
 | ``VHOST_USER_GET_QUEUE_NUM`` | 
 |   :id: 17 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: N/A | 
 |   :reply payload: u64 | 
 |  | 
 |   Query how many queues the back-end supports. | 
 |  | 
 |   This request should be sent only when ``VHOST_USER_PROTOCOL_F_MQ`` | 
 |   is set in queried protocol features by | 
 |   ``VHOST_USER_GET_PROTOCOL_FEATURES``. | 
 |  | 
 | ``VHOST_USER_SET_VRING_ENABLE`` | 
 |   :id: 18 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: vring state description | 
 |   :reply payload: N/A | 
 |  | 
 |   Signal the back-end to enable or disable corresponding vring. | 
 |  | 
 |   This request should be sent only when | 
 |   ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated. | 
 |  | 
 | ``VHOST_USER_SEND_RARP`` | 
 |   :id: 19 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: ``u64`` | 
 |   :reply payload: N/A | 
 |  | 
 |   Ask vhost user back-end to broadcast a fake RARP to notify the migration | 
 |   is terminated for guest that does not support GUEST_ANNOUNCE. | 
 |  | 
 |   Only legal if feature bit ``VHOST_USER_F_PROTOCOL_FEATURES`` is | 
 |   present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit | 
 |   ``VHOST_USER_PROTOCOL_F_RARP`` is present in | 
 |   ``VHOST_USER_GET_PROTOCOL_FEATURES``.  The first 6 bytes of the | 
 |   payload contain the mac address of the guest to allow the vhost user | 
 |   back-end to construct and broadcast the fake RARP. | 
 |  | 
 | ``VHOST_USER_NET_SET_MTU`` | 
 |   :id: 20 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: ``u64`` | 
 |   :reply payload: N/A | 
 |  | 
 |   Set host MTU value exposed to the guest. | 
 |  | 
 |   This request should be sent only when ``VIRTIO_NET_F_MTU`` feature | 
 |   has been successfully negotiated, ``VHOST_USER_F_PROTOCOL_FEATURES`` | 
 |   is present in ``VHOST_USER_GET_FEATURES`` and protocol feature bit | 
 |   ``VHOST_USER_PROTOCOL_F_NET_MTU`` is present in | 
 |   ``VHOST_USER_GET_PROTOCOL_FEATURES``. | 
 |  | 
 |   If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must | 
 |   respond with zero in case the specified MTU is valid, or non-zero | 
 |   otherwise. | 
 |  | 
 | ``VHOST_USER_SET_BACKEND_REQ_FD`` (previous name ``VHOST_USER_SET_SLAVE_REQ_FD``) | 
 |   :id: 21 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: N/A | 
 |   :reply payload: N/A | 
 |  | 
 |   Set the socket file descriptor for back-end initiated requests. It is passed | 
 |   in the ancillary data. | 
 |  | 
 |   This request should be sent only when | 
 |   ``VHOST_USER_F_PROTOCOL_FEATURES`` has been negotiated, and protocol | 
 |   feature bit ``VHOST_USER_PROTOCOL_F_BACKEND_REQ`` bit is present in | 
 |   ``VHOST_USER_GET_PROTOCOL_FEATURES``.  If | 
 |   ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, the back-end must | 
 |   respond with zero for success, non-zero otherwise. | 
 |  | 
 | ``VHOST_USER_IOTLB_MSG`` | 
 |   :id: 22 | 
 |   :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) | 
 |   :request payload: ``struct vhost_iotlb_msg`` | 
 |   :reply payload: ``u64`` | 
 |  | 
 |   Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. | 
 |  | 
 |   The front-end sends such requests to update and invalidate entries in the | 
 |   device IOTLB. The back-end has to acknowledge the request with sending | 
 |   zero as ``u64`` payload for success, non-zero otherwise. | 
 |  | 
 |   This request should be send only when ``VIRTIO_F_IOMMU_PLATFORM`` | 
 |   feature has been successfully negotiated. | 
 |  | 
 | ``VHOST_USER_SET_VRING_ENDIAN`` | 
 |   :id: 23 | 
 |   :equivalent ioctl: ``VHOST_SET_VRING_ENDIAN`` | 
 |   :request payload: vring state description | 
 |   :reply payload: N/A | 
 |  | 
 |   Set the endianness of a VQ for legacy devices. Little-endian is | 
 |   indicated with state.num set to 0 and big-endian is indicated with | 
 |   state.num set to 1. Other values are invalid. | 
 |  | 
 |   This request should be sent only when | 
 |   ``VHOST_USER_PROTOCOL_F_CROSS_ENDIAN`` has been negotiated. | 
 |   Backends that negotiated this feature should handle both | 
 |   endiannesses and expect this message once (per VQ) during device | 
 |   configuration (ie. before the front-end starts the VQ). | 
 |  | 
 | ``VHOST_USER_GET_CONFIG`` | 
 |   :id: 24 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: virtio device config space | 
 |   :reply payload: virtio device config space | 
 |  | 
 |   When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is | 
 |   submitted by the vhost-user front-end to fetch the contents of the | 
 |   virtio device configuration space, vhost-user back-end's payload size | 
 |   MUST match the front-end's request, vhost-user back-end uses zero length of | 
 |   payload to indicate an error to the vhost-user front-end. The vhost-user | 
 |   front-end may cache the contents to avoid repeated | 
 |   ``VHOST_USER_GET_CONFIG`` calls. | 
 |  | 
 | ``VHOST_USER_SET_CONFIG`` | 
 |   :id: 25 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: virtio device config space | 
 |   :reply payload: N/A | 
 |  | 
 |   When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, this message is | 
 |   submitted by the vhost-user front-end when the Guest changes the virtio | 
 |   device configuration space and also can be used for live migration | 
 |   on the destination host. The vhost-user back-end must check the flags | 
 |   field, and back-ends MUST NOT accept SET_CONFIG for read-only | 
 |   configuration space fields unless the live migration bit is set. | 
 |  | 
 | ``VHOST_USER_CREATE_CRYPTO_SESSION`` | 
 |   :id: 26 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: crypto session description | 
 |   :reply payload: crypto session description | 
 |  | 
 |   Create a session for crypto operation. The back-end must return | 
 |   the session id, 0 or positive for success, negative for failure. | 
 |   This request should be sent only when | 
 |   ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been | 
 |   successfully negotiated.  It's a required feature for crypto | 
 |   devices. | 
 |  | 
 | ``VHOST_USER_CLOSE_CRYPTO_SESSION`` | 
 |   :id: 27 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: ``u64`` | 
 |   :reply payload: N/A | 
 |  | 
 |   Close a session for crypto operation which was previously | 
 |   created by ``VHOST_USER_CREATE_CRYPTO_SESSION``. | 
 |  | 
 |   This request should be sent only when | 
 |   ``VHOST_USER_PROTOCOL_F_CRYPTO_SESSION`` feature has been | 
 |   successfully negotiated.  It's a required feature for crypto | 
 |   devices. | 
 |  | 
 | ``VHOST_USER_POSTCOPY_ADVISE`` | 
 |   :id: 28 | 
 |   :request payload: N/A | 
 |   :reply payload: userfault fd | 
 |  | 
 |   When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, the front-end | 
 |   advises back-end that a migration with postcopy enabled is underway, | 
 |   the back-end must open a userfaultfd for later use.  Note that at this | 
 |   stage the migration is still in precopy mode. | 
 |  | 
 | ``VHOST_USER_POSTCOPY_LISTEN`` | 
 |   :id: 29 | 
 |   :request payload: N/A | 
 |   :reply payload: N/A | 
 |  | 
 |   The front-end advises back-end that a transition to postcopy mode has | 
 |   happened.  The back-end must ensure that shared memory is registered | 
 |   with userfaultfd to cause faulting of non-present pages. | 
 |  | 
 |   This is always sent sometime after a ``VHOST_USER_POSTCOPY_ADVISE``, | 
 |   and thus only when ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported. | 
 |  | 
 | ``VHOST_USER_POSTCOPY_END`` | 
 |   :id: 30 | 
 |   :request payload: N/A | 
 |   :reply payload: ``u64`` | 
 |  | 
 |   The front-end advises that postcopy migration has now completed.  The back-end | 
 |   must disable the userfaultfd. The reply is an acknowledgement | 
 |   only. | 
 |  | 
 |   When ``VHOST_USER_PROTOCOL_F_PAGEFAULT`` is supported, this message | 
 |   is sent at the end of the migration, after | 
 |   ``VHOST_USER_POSTCOPY_LISTEN`` was previously sent. | 
 |  | 
 |   The value returned is an error indication; 0 is success. | 
 |  | 
 | ``VHOST_USER_GET_INFLIGHT_FD`` | 
 |   :id: 31 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: inflight description | 
 |   :reply payload: N/A | 
 |  | 
 |   When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has | 
 |   been successfully negotiated, this message is submitted by the front-end to | 
 |   get a shared buffer from back-end. The shared buffer will be used to | 
 |   track inflight I/O by back-end. QEMU should retrieve a new one when vm | 
 |   reset. | 
 |  | 
 | ``VHOST_USER_SET_INFLIGHT_FD`` | 
 |   :id: 32 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: inflight description | 
 |   :reply payload: N/A | 
 |  | 
 |   When ``VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD`` protocol feature has | 
 |   been successfully negotiated, this message is submitted by the front-end to | 
 |   send the shared inflight buffer back to the back-end so that the back-end | 
 |   could get inflight I/O after a crash or restart. | 
 |  | 
 | ``VHOST_USER_GPU_SET_SOCKET`` | 
 |   :id: 33 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: N/A | 
 |   :reply payload: N/A | 
 |  | 
 |   Sets the GPU protocol socket file descriptor, which is passed as | 
 |   ancillary data. The GPU protocol is used to inform the front-end of | 
 |   rendering state and updates. See vhost-user-gpu.rst for details. | 
 |  | 
 | ``VHOST_USER_RESET_DEVICE`` | 
 |   :id: 34 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: N/A | 
 |   :reply payload: N/A | 
 |  | 
 |   Ask the vhost user back-end to disable all rings and reset all | 
 |   internal device state to the initial state, ready to be | 
 |   reinitialized. The back-end retains ownership of the device | 
 |   throughout the reset operation. | 
 |  | 
 |   Only valid if the ``VHOST_USER_PROTOCOL_F_RESET_DEVICE`` protocol | 
 |   feature is set by the back-end. | 
 |  | 
 | ``VHOST_USER_VRING_KICK`` | 
 |   :id: 35 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: vring state description | 
 |   :reply payload: N/A | 
 |  | 
 |   When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol | 
 |   feature has been successfully negotiated, this message may be | 
 |   submitted by the front-end to indicate that a buffer was added to | 
 |   the vring instead of signalling it using the vring's kick file | 
 |   descriptor or having the back-end rely on polling. | 
 |  | 
 |   The state.num field is currently reserved and must be set to 0. | 
 |  | 
 | ``VHOST_USER_GET_MAX_MEM_SLOTS`` | 
 |   :id: 36 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: N/A | 
 |   :reply payload: u64 | 
 |  | 
 |   When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol | 
 |   feature has been successfully negotiated, this message is submitted | 
 |   by the front-end to the back-end. The back-end should return the message with a | 
 |   u64 payload containing the maximum number of memory slots for | 
 |   QEMU to expose to the guest. The value returned by the back-end | 
 |   will be capped at the maximum number of ram slots which can be | 
 |   supported by the target platform. | 
 |  | 
 | ``VHOST_USER_ADD_MEM_REG`` | 
 |   :id: 37 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: N/A | 
 |   :reply payload: single memory region description | 
 |  | 
 |   When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol | 
 |   feature has been successfully negotiated, this message is submitted | 
 |   by the front-end to the back-end. The message payload contains a memory | 
 |   region descriptor struct, describing a region of guest memory which | 
 |   the back-end device must map in. When the | 
 |   ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has | 
 |   been successfully negotiated, along with the | 
 |   ``VHOST_USER_REM_MEM_REG`` message, this message is used to set and | 
 |   update the memory tables of the back-end device. | 
 |  | 
 |   Exactly one file descriptor from which the memory is mapped is | 
 |   passed in the ancillary data. | 
 |  | 
 |   In postcopy mode (see ``VHOST_USER_POSTCOPY_LISTEN``), the back-end | 
 |   replies with the bases of the memory mapped region to the front-end. | 
 |   For further details on postcopy, see ``VHOST_USER_SET_MEM_TABLE``. | 
 |   They apply to ``VHOST_USER_ADD_MEM_REG`` accordingly. | 
 |  | 
 | ``VHOST_USER_REM_MEM_REG`` | 
 |   :id: 38 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: N/A | 
 |   :reply payload: single memory region description | 
 |  | 
 |   When the ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol | 
 |   feature has been successfully negotiated, this message is submitted | 
 |   by the front-end to the back-end. The message payload contains a memory | 
 |   region descriptor struct, describing a region of guest memory which | 
 |   the back-end device must unmap. When the | 
 |   ``VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS`` protocol feature has | 
 |   been successfully negotiated, along with the | 
 |   ``VHOST_USER_ADD_MEM_REG`` message, this message is used to set and | 
 |   update the memory tables of the back-end device. | 
 |  | 
 |   The memory region to be removed is identified by its guest address, | 
 |   user address and size. The mmap offset is ignored. | 
 |  | 
 |   No file descriptors SHOULD be passed in the ancillary data. For | 
 |   compatibility with existing incorrect implementations, the back-end MAY | 
 |   accept messages with one file descriptor. If a file descriptor is | 
 |   passed, the back-end MUST close it without using it otherwise. | 
 |  | 
 | ``VHOST_USER_SET_STATUS`` | 
 |   :id: 39 | 
 |   :equivalent ioctl: VHOST_VDPA_SET_STATUS | 
 |   :request payload: ``u64`` | 
 |   :reply payload: N/A | 
 |  | 
 |   When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been | 
 |   successfully negotiated, this message is submitted by the front-end to | 
 |   notify the back-end with updated device status as defined in the Virtio | 
 |   specification. | 
 |  | 
 | ``VHOST_USER_GET_STATUS`` | 
 |   :id: 40 | 
 |   :equivalent ioctl: VHOST_VDPA_GET_STATUS | 
 |   :request payload: N/A | 
 |   :reply payload: ``u64`` | 
 |  | 
 |   When the ``VHOST_USER_PROTOCOL_F_STATUS`` protocol feature has been | 
 |   successfully negotiated, this message is submitted by the front-end to | 
 |   query the back-end for its device status as defined in the Virtio | 
 |   specification. | 
 |  | 
 | ``VHOST_USER_GET_SHARED_OBJECT`` | 
 |   :id: 41 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: ``struct VhostUserShared`` | 
 |   :reply payload: dmabuf fd | 
 |  | 
 |   When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol | 
 |   feature has been successfully negotiated, and the UUID is found | 
 |   in the exporters cache, this message is submitted by the front-end | 
 |   to retrieve a given dma-buf fd from a given back-end, determined by | 
 |   the requested UUID. Back-end will reply passing the fd when the operation | 
 |   is successful, or no fd otherwise. | 
 |  | 
 | ``VHOST_USER_SET_DEVICE_STATE_FD`` | 
 |   :id: 42 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: device state transfer parameters | 
 |   :reply payload: ``u64`` | 
 |  | 
 |   Front-end and back-end negotiate a channel over which to transfer the | 
 |   back-end's internal state during migration.  Either side (front-end or | 
 |   back-end) may create the channel.  The nature of this channel is not | 
 |   restricted or defined in this document, but whichever side creates it | 
 |   must create a file descriptor that is provided to the respectively | 
 |   other side, allowing access to the channel.  This FD must behave as | 
 |   follows: | 
 |  | 
 |   * For the writing end, it must allow writing the whole back-end state | 
 |     sequentially.  Closing the file descriptor signals the end of | 
 |     transfer. | 
 |  | 
 |   * For the reading end, it must allow reading the whole back-end state | 
 |     sequentially.  The end of file signals the end of the transfer. | 
 |  | 
 |   For example, the channel may be a pipe, in which case the two ends of | 
 |   the pipe fulfill these requirements respectively. | 
 |  | 
 |   Initially, the front-end creates a channel along with such an FD.  It | 
 |   passes the FD to the back-end as ancillary data of a | 
 |   ``VHOST_USER_SET_DEVICE_STATE_FD`` message.  The back-end may create a | 
 |   different transfer channel, passing the respective FD back to the | 
 |   front-end as ancillary data of the reply.  If so, the front-end must | 
 |   then discard its channel and use the one provided by the back-end. | 
 |  | 
 |   Whether the back-end should decide to use its own channel is decided | 
 |   based on efficiency: If the channel is a pipe, both ends will most | 
 |   likely need to copy data into and out of it.  Any channel that allows | 
 |   for more efficient processing on at least one end, e.g. through | 
 |   zero-copy, is considered more efficient and thus preferred.  If the | 
 |   back-end can provide such a channel, it should decide to use it. | 
 |  | 
 |   The request payload contains parameters for the subsequent data | 
 |   transfer, as described in the :ref:`Migrating back-end state | 
 |   <migrating_backend_state>` section. | 
 |  | 
 |   The value returned is both an indication for success, and whether a | 
 |   file descriptor for a back-end-provided channel is returned: Bits 0–7 | 
 |   are 0 on success, and non-zero on error.  Bit 8 is the invalid FD | 
 |   flag; this flag is set when there is no file descriptor returned. | 
 |   When this flag is not set, the front-end must use the returned file | 
 |   descriptor as its end of the transfer channel.  The back-end must not | 
 |   both indicate an error and return a file descriptor. | 
 |  | 
 |   Using this function requires prior negotiation of the | 
 |   ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature. | 
 |  | 
 | ``VHOST_USER_CHECK_DEVICE_STATE`` | 
 |   :id: 43 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: N/A | 
 |   :reply payload: ``u64`` | 
 |  | 
 |   After transferring the back-end's internal state during migration (see | 
 |   the :ref:`Migrating back-end state <migrating_backend_state>` | 
 |   section), check whether the back-end was able to successfully fully | 
 |   process the state. | 
 |  | 
 |   The value returned indicates success or error; 0 is success, any | 
 |   non-zero value is an error. | 
 |  | 
 |   Using this function requires prior negotiation of the | 
 |   ``VHOST_USER_PROTOCOL_F_DEVICE_STATE`` feature. | 
 |  | 
 | Back-end message types | 
 | ---------------------- | 
 |  | 
 | For this type of message, the request is sent by the back-end and the reply | 
 | is sent by the front-end. | 
 |  | 
 | ``VHOST_USER_BACKEND_IOTLB_MSG`` (previous name ``VHOST_USER_SLAVE_IOTLB_MSG``) | 
 |   :id: 1 | 
 |   :equivalent ioctl: N/A (equivalent to ``VHOST_IOTLB_MSG`` message type) | 
 |   :request payload: ``struct vhost_iotlb_msg`` | 
 |   :reply payload: N/A | 
 |  | 
 |   Send IOTLB messages with ``struct vhost_iotlb_msg`` as payload. | 
 |   The back-end sends such requests to notify of an IOTLB miss, or an IOTLB | 
 |   access failure. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is | 
 |   negotiated, and back-end set the ``VHOST_USER_NEED_REPLY`` flag, the front-end | 
 |   must respond with zero when operation is successfully completed, or | 
 |   non-zero otherwise.  This request should be send only when | 
 |   ``VIRTIO_F_IOMMU_PLATFORM`` feature has been successfully | 
 |   negotiated. | 
 |  | 
 | ``VHOST_USER_BACKEND_CONFIG_CHANGE_MSG`` (previous name ``VHOST_USER_SLAVE_CONFIG_CHANGE_MSG``) | 
 |   :id: 2 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: N/A | 
 |   :reply payload: N/A | 
 |  | 
 |   When ``VHOST_USER_PROTOCOL_F_CONFIG`` is negotiated, vhost-user | 
 |   back-end sends such messages to notify that the virtio device's | 
 |   configuration space has changed, for those host devices which can | 
 |   support such feature, host driver can send ``VHOST_USER_GET_CONFIG`` | 
 |   message to the back-end to get the latest content. If | 
 |   ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the back-end sets the | 
 |   ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond with zero when | 
 |   operation is successfully completed, or non-zero otherwise. | 
 |  | 
 | ``VHOST_USER_BACKEND_VRING_HOST_NOTIFIER_MSG`` (previous name ``VHOST_USER_SLAVE_VRING_HOST_NOTIFIER_MSG``) | 
 |   :id: 3 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: vring area description | 
 |   :reply payload: N/A | 
 |  | 
 |   Sets host notifier for a specified queue. The queue index is | 
 |   contained in the ``u64`` field of the vring area description. The | 
 |   host notifier is described by the file descriptor (typically it's a | 
 |   VFIO device fd) which is passed as ancillary data and the size | 
 |   (which is mmap size and should be the same as host page size) and | 
 |   offset (which is mmap offset) carried in the vring area | 
 |   description. QEMU can mmap the file descriptor based on the size and | 
 |   offset to get a memory range. Registering a host notifier means | 
 |   mapping this memory range to the VM as the specified queue's notify | 
 |   MMIO region. The back-end sends this request to tell QEMU to de-register | 
 |   the existing notifier if any and register the new notifier if the | 
 |   request is sent with a file descriptor. | 
 |  | 
 |   This request should be sent only when | 
 |   ``VHOST_USER_PROTOCOL_F_HOST_NOTIFIER`` protocol feature has been | 
 |   successfully negotiated. | 
 |  | 
 | ``VHOST_USER_BACKEND_VRING_CALL`` (previous name ``VHOST_USER_SLAVE_VRING_CALL``) | 
 |   :id: 4 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: vring state description | 
 |   :reply payload: N/A | 
 |  | 
 |   When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol | 
 |   feature has been successfully negotiated, this message may be | 
 |   submitted by the back-end to indicate that a buffer was used from | 
 |   the vring instead of signalling this using the vring's call file | 
 |   descriptor or having the front-end relying on polling. | 
 |  | 
 |   The state.num field is currently reserved and must be set to 0. | 
 |  | 
 | ``VHOST_USER_BACKEND_VRING_ERR`` (previous name ``VHOST_USER_SLAVE_VRING_ERR``) | 
 |   :id: 5 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: vring state description | 
 |   :reply payload: N/A | 
 |  | 
 |   When the ``VHOST_USER_PROTOCOL_F_INBAND_NOTIFICATIONS`` protocol | 
 |   feature has been successfully negotiated, this message may be | 
 |   submitted by the back-end to indicate that an error occurred on the | 
 |   specific vring, instead of signalling the error file descriptor | 
 |   set by the front-end via ``VHOST_USER_SET_VRING_ERR``. | 
 |  | 
 |   The state.num field is currently reserved and must be set to 0. | 
 |  | 
 | ``VHOST_USER_BACKEND_SHARED_OBJECT_ADD`` | 
 |   :id: 6 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: ``struct VhostUserShared`` | 
 |   :reply payload: N/A | 
 |  | 
 |   When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol | 
 |   feature has been successfully negotiated, this message can be submitted | 
 |   by the backends to add themselves as exporters to the virtio shared lookup | 
 |   table. The back-end device gets associated with a UUID in the shared table. | 
 |   The back-end is responsible of keeping its own table with exported dma-buf fds. | 
 |   When another back-end tries to import the resource associated with the UUID, | 
 |   it will send a message to the front-end, which will act as a proxy to the | 
 |   exporter back-end. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and | 
 |   the back-end sets the ``VHOST_USER_NEED_REPLY`` flag, the front-end must | 
 |   respond with zero when operation is successfully completed, or non-zero | 
 |   otherwise. | 
 |  | 
 | ``VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE`` | 
 |   :id: 7 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: ``struct VhostUserShared`` | 
 |   :reply payload: N/A | 
 |  | 
 |   When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol | 
 |   feature has been successfully negotiated, this message can be submitted | 
 |   by the backend to remove themselves from to the virtio-dmabuf shared | 
 |   table API. Only the back-end owning the entry (i.e., the one that first added | 
 |   it) will have permission to remove it. Otherwise, the message is ignored. | 
 |   The shared table will remove the back-end device associated with | 
 |   the UUID. If ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` is negotiated, and the | 
 |   back-end sets the ``VHOST_USER_NEED_REPLY`` flag, the front-end must respond | 
 |   with zero when operation is successfully completed, or non-zero otherwise. | 
 |  | 
 | ``VHOST_USER_BACKEND_SHARED_OBJECT_LOOKUP`` | 
 |   :id: 8 | 
 |   :equivalent ioctl: N/A | 
 |   :request payload: ``struct VhostUserShared`` | 
 |   :reply payload: dmabuf fd and ``u64`` | 
 |  | 
 |   When the ``VHOST_USER_PROTOCOL_F_SHARED_OBJECT`` protocol | 
 |   feature has been successfully negotiated, this message can be submitted | 
 |   by the backends to retrieve a given dma-buf fd from the virtio-dmabuf | 
 |   shared table given a UUID. Frontend will reply passing the fd and a zero | 
 |   when the operation is successful, or non-zero otherwise. Note that if the | 
 |   operation fails, no fd is sent to the backend. | 
 |  | 
 | .. _reply_ack: | 
 |  | 
 | VHOST_USER_PROTOCOL_F_REPLY_ACK | 
 | ------------------------------- | 
 |  | 
 | The original vhost-user specification only demands replies for certain | 
 | commands. This differs from the vhost protocol implementation where | 
 | commands are sent over an ``ioctl()`` call and block until the back-end | 
 | has completed. | 
 |  | 
 | With this protocol extension negotiated, the sender (QEMU) can set the | 
 | ``need_reply`` [Bit 3] flag to any command. This indicates that the | 
 | back-end MUST respond with a Payload ``VhostUserMsg`` indicating success | 
 | or failure. The payload should be set to zero on success or non-zero | 
 | on failure, unless the message already has an explicit reply body. | 
 |  | 
 | The reply payload gives QEMU a deterministic indication of the result | 
 | of the command. Today, QEMU is expected to terminate the main vhost-user | 
 | loop upon receiving such errors. In future, qemu could be taught to be more | 
 | resilient for selective requests. | 
 |  | 
 | For the message types that already solicit a reply from the back-end, | 
 | the presence of ``VHOST_USER_PROTOCOL_F_REPLY_ACK`` or need_reply bit | 
 | being set brings no behavioural change. (See the Communication_ | 
 | section for details.) | 
 |  | 
 | .. _backend_conventions: | 
 |  | 
 | Backend program conventions | 
 | =========================== | 
 |  | 
 | vhost-user back-ends can provide various devices & services and may | 
 | need to be configured manually depending on the use case. However, it | 
 | is a good idea to follow the conventions listed here when | 
 | possible. Users, QEMU or libvirt, can then rely on some common | 
 | behaviour to avoid heterogeneous configuration and management of the | 
 | back-end programs and facilitate interoperability. | 
 |  | 
 | Each back-end installed on a host system should come with at least one | 
 | JSON file that conforms to the vhost-user.json schema. Each file | 
 | informs the management applications about the back-end type, and binary | 
 | location. In addition, it defines rules for management apps for | 
 | picking the highest priority back-end when multiple match the search | 
 | criteria (see ``@VhostUserBackend`` documentation in the schema file). | 
 |  | 
 | If the back-end is not capable of enabling a requested feature on the | 
 | host (such as 3D acceleration with virgl), or the initialization | 
 | failed, the back-end should fail to start early and exit with a status | 
 | != 0. It may also print a message to stderr for further details. | 
 |  | 
 | The back-end program must not daemonize itself, but it may be | 
 | daemonized by the management layer. It may also have a restricted | 
 | access to the system. | 
 |  | 
 | File descriptors 0, 1 and 2 will exist, and have regular | 
 | stdin/stdout/stderr usage (they may have been redirected to /dev/null | 
 | by the management layer, or to a log handler). | 
 |  | 
 | The back-end program must end (as quickly and cleanly as possible) when | 
 | the SIGTERM signal is received. Eventually, it may receive SIGKILL by | 
 | the management layer after a few seconds. | 
 |  | 
 | The following command line options have an expected behaviour. They | 
 | are mandatory, unless explicitly said differently: | 
 |  | 
 | --socket-path=PATH | 
 |  | 
 |   This option specify the location of the vhost-user Unix domain socket. | 
 |   It is incompatible with --fd. | 
 |  | 
 | --fd=FDNUM | 
 |  | 
 |   When this argument is given, the back-end program is started with the | 
 |   vhost-user socket as file descriptor FDNUM. It is incompatible with | 
 |   --socket-path. | 
 |  | 
 | --print-capabilities | 
 |  | 
 |   Output to stdout the back-end capabilities in JSON format, and then | 
 |   exit successfully. Other options and arguments should be ignored, and | 
 |   the back-end program should not perform its normal function.  The | 
 |   capabilities can be reported dynamically depending on the host | 
 |   capabilities. | 
 |  | 
 | The JSON output is described in the ``vhost-user.json`` schema, by | 
 | ```@VHostUserBackendCapabilities``.  Example: | 
 |  | 
 | .. code:: json | 
 |  | 
 |   { | 
 |     "type": "foo", | 
 |     "features": [ | 
 |       "feature-a", | 
 |       "feature-b" | 
 |     ] | 
 |   } | 
 |  | 
 | vhost-user-input | 
 | ---------------- | 
 |  | 
 | Command line options: | 
 |  | 
 | --evdev-path=PATH | 
 |  | 
 |   Specify the linux input device. | 
 |  | 
 |   (optional) | 
 |  | 
 | --no-grab | 
 |  | 
 |   Do no request exclusive access to the input device. | 
 |  | 
 |   (optional) | 
 |  | 
 | vhost-user-gpu | 
 | -------------- | 
 |  | 
 | Command line options: | 
 |  | 
 | --render-node=PATH | 
 |  | 
 |   Specify the GPU DRM render node. | 
 |  | 
 |   (optional) | 
 |  | 
 | --virgl | 
 |  | 
 |   Enable virgl rendering support. | 
 |  | 
 |   (optional) | 
 |  | 
 | vhost-user-blk | 
 | -------------- | 
 |  | 
 | Command line options: | 
 |  | 
 | --blk-file=PATH | 
 |  | 
 |   Specify block device or file path. | 
 |  | 
 |   (optional) | 
 |  | 
 | --read-only | 
 |  | 
 |   Enable read-only. | 
 |  | 
 |   (optional) |