Accessing memory with libvfio-user

A vfio-user client informs the server of its memory regions available for access. Each DMA region might correspond, for example, to a guest VM's memory region.

A server that wishes to access such client-shared memory must call:

vfu_setup_device_dma(..., register_cb, unregister_cb);

during initialization. The two callbacks are invoked when client regions are added and removed.

Memory region callbacks

For either callback, the following information is given:

/*
 * Info for a guest DMA region.  @iova is always valid; the other parameters
 * will only be set if the guest DMA region is mappable.
 *
 * @iova: guest DMA range. This is the guest physical range (as we don't
 *   support vIOMMU) that the guest registers for DMA, via a VFIO_USER_DMA_MAP
 *   message, and is the address space used as input to vfu_addr_to_sgl().
 * @vaddr: if the range is mapped into this process, this is the virtual address
 *   of the start of the region.
 * @mapping: if @vaddr is non-NULL, this range represents the actual range
 *   mmap()ed into the process. This might be (large) page aligned, and
 *   therefore be different from @vaddr + @iova.iov_len.
 * @page_size: if @vaddr is non-NULL, page size of the mapping (e.g. 2MB)
 * @prot: if @vaddr is non-NULL, protection settings of the mapping as per
 *   mmap(2)
 *
 * For a real example, using the gpio sample server, and a qemu configured to
 * use huge pages and share its memory:
 *
 * gpio: mapped DMA region iova=[0xf0000-0x10000000) vaddr=0x2aaaab0f0000
 * page_size=0x200000 mapping=[0x2aaaab000000-0x2aaabb000000)
 *
 *     0xf0000                    0x10000000
 *     |                                   |
 *     v                                   v
 *     +-----------------------------------+
 *     | Guest IOVA (DMA) space            |
 *  +--+-----------------------------------+--+
 *  |  |                                   |  |
 *  |  +-----------------------------------+  |
 *  |  ^ libvfio-user server address space    |
 *  +--|--------------------------------------+
 *  ^ vaddr=0x2aaaab0f0000                    ^
 *  |                                         |
 *  0x2aaaab000000               0x2aaabb000000
 *
 * This region can be directly accessed at 0x2aaaab0f0000, but the underlying
 * large page mapping is in the range [0x2aaaab000000-0x2aaabb000000).
 */
typedef struct vfu_dma_info {
    struct iovec iova;
    void *vaddr;
    struct iovec mapping;
    size_t page_size;
    uint32_t prot;
} vfu_dma_info_t;

The remove callback is expected to arrange for all usage of the memory region to be stopped (or to return EBUSY, to trigger quiescence instead), including all needed vfu_sgl_put() calls for SGLs that are within the memory region.

Accessing mapped regions

As described above, libvfio-user may map remote client memory into the process's address space, allowing direct access. To access these mappings, the caller must first construct an SGL corresponding to the IOVA start and length:

dma_sg_t *sgl = calloc(2, dma_sg_size());

vfu_addr_to_sgl(vfu_ctx, iova, len, sgl, 2, PROT_READ | PROT_WRITE);

For example, the device may have received an IOVA from a write to PCI config space. Due to guest memory topology, certain accesses may not fit in a single scatter-gather entry, therefore this API allows for an array of SGs to be provided as necessary.

If PROT_WRITE is given, the library presumes that the user may write to the SGL mappings at any time; this is used for dirty page tracking.

iovec construction

Next, a user wishing to directly access shared memory should convert the SGL into an array of iovecs:

vfu_sgl_get(vfu_ctx, sgl, iovec, cnt, 0);

The caller should provide an array of struct iovec that correspond with the number of SGL entries. After this call, iovec.iov_base is the virtual address with which the range may be directly read from (or written to).

Releasing SGL access

When a particular iovec is finished with, the user can call:

vfu_sgl_put(vfu_ctx, sgl, iovec, cnt);

After this call, the SGL must not be accessed via the iovec VAs. As mentioned above, if the SGL was writeable, this will automatically mark all pages within the SGL as dirty for live migration purposes.

Dirty page handling

In some cases, such as when entering stop-and-copy state in live migration, it can be useful to mark an SGL as dirty without releasing it. This can be done via the call:

vfu_sgl_mark_dirty(vfu_ctx, sgl, cnt);

Non-mapped region access

Clients are not required to share the memory mapping. If this is not the case, then the server may only read or write the region the slower way:

...
vfu_addr_to_sgl(ctx, iova, len, sg, 1, PROT_WRITE);

vfu_sgl_write(ctx, sg, 1, &buf);

Note that in this case, the server is not expected to report any dirty writes via vfu_sgl_mark_dirty(): as the client is actually writing to memory, it‘s the client’s responsibility to track any dirtying.