Vhost-user Protocol
===================

Copyright (c) 2014 Virtual Open Systems Sarl.

This work is licensed under the terms of the GNU GPL, version 2 or later.
See the COPYING file in the top-level directory.
===================

This protocol is aiming to complement the ioctl interface used to control the
vhost implementation in the Linux kernel. It implements the control plane needed
to establish virtqueue sharing with a user space process on the same host. It
uses communication over a Unix domain socket to share file descriptors in the
ancillary data of the message.

The protocol defines 2 sides of the communication, master and slave. Master is
the application that shares its virtqueues, in our case QEMU. Slave is the
consumer of the virtqueues.

In the current implementation QEMU is the Master, and the Slave is intended to
be a software Ethernet switch running in user space, such as Snabbswitch.

Master and slave can be either a client (i.e. connecting) or server (listening)
in the socket communication.

Message Specification
---------------------

Note that all numbers are in the machine native byte order. A vhost-user message
consists of 3 header fields and a payload:

------------------------------------
| request | flags | size | payload |
------------------------------------

 * Request: 32-bit type of the request
 * Flags: 32-bit bit field:
   - Lower 2 bits are the version (currently 0x01)
   - Bit 2 is the reply flag - needs to be sent on each reply from the slave
 * Size - 32-bit size of the payload


Depending on the request type, payload can be:

 * A single 64-bit integer
   -------
   | u64 |
   -------

   u64: a 64-bit unsigned integer

 * A vring state description
   ---------------
  | index | num |
  ---------------

   Index: a 32-bit index
   Num: a 32-bit number

 * A vring address description
   --------------------------------------------------------------
   | index | flags | size | descriptor | used | available | log |
   --------------------------------------------------------------

   Index: a 32-bit vring index
   Flags: a 32-bit vring flags
   Descriptor: a 64-bit user address of the vring descriptor table
   Used: a 64-bit user address of the vring used ring
   Available: a 64-bit user address of the vring available ring
   Log: a 64-bit guest address for logging

 * Memory regions description
   ---------------------------------------------------
   | num regions | padding | region0 | ... | region7 |
   ---------------------------------------------------

   Num regions: a 32-bit number of regions
   Padding: 32-bit

   A region is:
   ---------------------------------------
   | guest address | size | user address |
   ---------------------------------------

   Guest address: a 64-bit guest address of the region
   Size: a 64-bit size
   User address: a 64-bit user address


In QEMU the vhost-user message is implemented with the following struct:

typedef struct VhostUserMsg {
    VhostUserRequest request;
    uint32_t flags;
    uint32_t size;
    union {
        uint64_t u64;
        struct vhost_vring_state state;
        struct vhost_vring_addr addr;
        VhostUserMemory memory;
    };
} QEMU_PACKED VhostUserMsg;

Communication
-------------

The protocol for vhost-user is based on the existing implementation of vhost
for the Linux Kernel. Most messages that can be sent via the Unix domain socket
implementing vhost-user have an equivalent ioctl to the kernel implementation.

The communication consists of master sending message requests and slave sending
message replies. Most of the requests don't require replies. Here is a list of
the ones that do:

 * VHOST_GET_FEATURES
 * VHOST_GET_VRING_BASE

There are several messages that the master sends with file descriptors passed
in the ancillary data:

 * VHOST_SET_MEM_TABLE
 * VHOST_SET_LOG_FD
 * VHOST_SET_VRING_KICK
 * VHOST_SET_VRING_CALL
 * VHOST_SET_VRING_ERR

If Master is unable to send the full message or receives a wrong reply it will
close the connection. An optional reconnection mechanism can be implemented.

Message types
-------------

 * VHOST_USER_GET_FEATURES

      Id: 2
      Equivalent ioctl: VHOST_GET_FEATURES
      Master payload: N/A
      Slave payload: u64

      Get from the underlying vhost implementation the features bitmask.

 * VHOST_USER_SET_FEATURES

      Id: 3
      Ioctl: VHOST_SET_FEATURES
      Master payload: u64

      Enable features in the underlying vhost implementation using a bitmask.

 * VHOST_USER_SET_OWNER

      Id: 4
      Equivalent ioctl: VHOST_SET_OWNER
      Master payload: N/A

      Issued when a new connection is established. It sets the current Master
      as an owner of the session. This can be used on the Slave as a
      "session start" flag.

 * VHOST_USER_RESET_OWNER

      Id: 5
      Equivalent ioctl: VHOST_RESET_OWNER
      Master payload: N/A

      Issued when a new connection is about to be closed. The Master will no
      longer own this connection (and will usually close it).

 * VHOST_USER_SET_MEM_TABLE

      Id: 6
      Equivalent ioctl: VHOST_SET_MEM_TABLE
      Master payload: memory regions description

      Sets the memory map regions on the slave so it can translate the vring
      addresses. In the ancillary data there is an array of file descriptors
      for each memory mapped region. The size and ordering of the fds matches
      the number and ordering of memory regions.

 * VHOST_USER_SET_LOG_BASE

      Id: 7
      Equivalent ioctl: VHOST_SET_LOG_BASE
      Master payload: u64

      Sets the logging base address.

 * VHOST_USER_SET_LOG_FD

      Id: 8
      Equivalent ioctl: VHOST_SET_LOG_FD
      Master payload: N/A

      Sets the logging file descriptor, which is passed as ancillary data.

 * VHOST_USER_SET_VRING_NUM

      Id: 9
      Equivalent ioctl: VHOST_SET_VRING_NUM
      Master payload: vring state description

      Sets the number of vrings for this owner.

 * VHOST_USER_SET_VRING_ADDR

      Id: 10
      Equivalent ioctl: VHOST_SET_VRING_ADDR
      Master payload: vring address description
      Slave payload: N/A

      Sets the addresses of the different aspects of the vring.

 * VHOST_USER_SET_VRING_BASE

      Id: 11
      Equivalent ioctl: VHOST_SET_VRING_BASE
      Master payload: vring state description

      Sets the base offset in the available vring.

 * VHOST_USER_GET_VRING_BASE

      Id: 12
      Equivalent ioctl: VHOST_USER_GET_VRING_BASE
      Master payload: vring state description
      Slave payload: vring state description

      Get the available vring base offset.

 * VHOST_USER_SET_VRING_KICK

      Id: 13
      Equivalent ioctl: VHOST_SET_VRING_KICK
      Master payload: u64

      Set the event file descriptor for adding buffers to the vring. It
      is passed in the ancillary data.
      Bits (0-7) of the payload contain the vring index. Bit 8 is the
      invalid FD flag. This flag is set when there is no file descriptor
      in the ancillary data. This signals that polling should be used
      instead of waiting for a kick.

 * VHOST_USER_SET_VRING_CALL

      Id: 14
      Equivalent ioctl: VHOST_SET_VRING_CALL
      Master payload: u64

      Set the event file descriptor to signal when buffers are used. It
      is passed in the ancillary data.
      Bits (0-7) of the payload contain the vring index. Bit 8 is the
      invalid FD flag. This flag is set when there is no file descriptor
      in the ancillary data. This signals that polling will be used
      instead of waiting for the call.

 * VHOST_USER_SET_VRING_ERR

      Id: 15
      Equivalent ioctl: VHOST_SET_VRING_ERR
      Master payload: u64

      Set the event file descriptor to signal when error occurs. It
      is passed in the ancillary data.
      Bits (0-7) of the payload contain the vring index. Bit 8 is the
      invalid FD flag. This flag is set when there is no file descriptor
      in the ancillary data.
