| # Accessing memory with libvfio-user |
| |
| A vfio-user client informs the server of its memory regions available for |
| access. Each DMA region might correspond, for example, to a guest VM's memory |
| region. |
| |
| A server that wishes to access such client-shared memory must call: |
| |
| ``` |
| vfu_setup_device_dma(..., register_cb, unregister_cb); |
| ``` |
| |
| during initialization. The two callbacks are invoked when client regions are |
| added and removed. |
| |
| ## Memory region callbacks |
| |
| For either callback, the following information is given: |
| |
| ``` |
| /* |
| * Info for a guest DMA region. @iova is always valid; the other parameters |
| * will only be set if the guest DMA region is mappable. |
| * |
| * @iova: guest DMA range. This is the guest physical range (as we don't |
| * support vIOMMU) that the guest registers for DMA, via a VFIO_USER_DMA_MAP |
| * message, and is the address space used as input to vfu_addr_to_sgl(). |
| * @vaddr: if the range is mapped into this process, this is the virtual address |
| * of the start of the region. |
| * @mapping: if @vaddr is non-NULL, this range represents the actual range |
| * mmap()ed into the process. This might be (large) page aligned, and |
| * therefore be different from @vaddr + @iova.iov_len. |
| * @page_size: if @vaddr is non-NULL, page size of the mapping (e.g. 2MB) |
| * @prot: if @vaddr is non-NULL, protection settings of the mapping as per |
| * mmap(2) |
| * |
| * For a real example, using the gpio sample server, and a qemu configured to |
| * use huge pages and share its memory: |
| * |
| * gpio: mapped DMA region iova=[0xf0000-0x10000000) vaddr=0x2aaaab0f0000 |
| * page_size=0x200000 mapping=[0x2aaaab000000-0x2aaabb000000) |
| * |
| * 0xf0000 0x10000000 |
| * | | |
| * v v |
| * +-----------------------------------+ |
| * | Guest IOVA (DMA) space | |
| * +--+-----------------------------------+--+ |
| * | | | | |
| * | +-----------------------------------+ | |
| * | ^ libvfio-user server address space | |
| * +--|--------------------------------------+ |
| * ^ vaddr=0x2aaaab0f0000 ^ |
| * | | |
| * 0x2aaaab000000 0x2aaabb000000 |
| * |
| * This region can be directly accessed at 0x2aaaab0f0000, but the underlying |
| * large page mapping is in the range [0x2aaaab000000-0x2aaabb000000). |
| */ |
| typedef struct vfu_dma_info { |
| struct iovec iova; |
| void *vaddr; |
| struct iovec mapping; |
| size_t page_size; |
| uint32_t prot; |
| } vfu_dma_info_t; |
| ``` |
| |
| The remove callback is expected to arrange for all usage of the memory region to |
| be stopped (or to return `EBUSY`, to trigger quiescence instead), including all |
| needed `vfu_sgl_put()` calls for SGLs that are within the memory region. |
| |
| ## Accessing mapped regions |
| |
| As described above, `libvfio-user` may map remote client memory into the |
| process's address space, allowing direct access. To access these mappings, the |
| caller must first construct an SGL corresponding to the IOVA start and length: |
| |
| ``` |
| dma_sg_t *sgl = calloc(2, dma_sg_size()); |
| |
| vfu_addr_to_sgl(vfu_ctx, iova, len, sgl, 2, PROT_READ | PROT_WRITE); |
| ``` |
| |
| For example, the device may have received an IOVA from a write to PCI config |
| space. Due to guest memory topology, certain accesses may not fit in a single |
| scatter-gather entry, therefore this API allows for an array of SGs to be |
| provided as necessary. |
| |
| If `PROT_WRITE` is given, the library presumes that the user may write to the |
| SGL mappings at any time; this is used for dirty page tracking. |
| |
| ### `iovec` construction |
| |
| Next, a user wishing to directly access shared memory should convert the SGL |
| into an array of iovecs: |
| |
| ``` |
| vfu_sgl_get(vfu_ctx, sgl, iovec, cnt, 0); |
| ``` |
| |
| The caller should provide an array of `struct iovec` that correspond with the |
| number of SGL entries. After this call, `iovec.iov_base` is the virtual address |
| with which the range may be directly read from (or written to). |
| |
| ### Releasing SGL access |
| |
| When a particular iovec is finished with, the user can call: |
| |
| ``` |
| vfu_sgl_put(vfu_ctx, sgl, iovec, cnt); |
| ``` |
| |
| After this call, the SGL must not be accessed via the iovec VAs. As mentioned |
| above, if the SGL was writeable, this will automatically mark all pages within |
| the SGL as dirty for live migration purposes. |
| |
| ### Dirty page handling |
| |
| In some cases, such as when entering stop-and-copy state in live migration, it |
| can be useful to mark an SGL as dirty without releasing it. This can be done via |
| the call: |
| |
| ``` |
| vfu_sgl_mark_dirty(vfu_ctx, sgl, cnt); |
| ``` |
| |
| ## Non-mapped region access |
| |
| Clients are not required to share the memory mapping. If this is *not* the |
| case, then the server may only read or write the region the slower way: |
| |
| |
| ``` |
| ... |
| vfu_addr_to_sgl(ctx, iova, len, sg, 1, PROT_READ); |
| |
| vfu_sgl_read(ctx, sg, 1, &buf); |
| ``` |