| The memory API |
| ============== |
| |
| The memory API models the memory and I/O buses and controllers of a QEMU |
| machine. It attempts to allow modelling of: |
| |
| - ordinary RAM |
| - memory-mapped I/O (MMIO) |
| - memory controllers that can dynamically reroute physical memory regions |
| to different destinations |
| |
| The memory model provides support for |
| |
| - tracking RAM changes by the guest |
| - setting up coalesced memory for kvm |
| - setting up ioeventfd regions for kvm |
| |
| Memory is modelled as a tree (really acyclic graph) of MemoryRegion objects. |
| The root of the tree is memory as seen from the CPU's viewpoint (the system |
| bus). Nodes in the tree represent other buses, memory controllers, and |
| memory regions that have been rerouted. Leaves are RAM and MMIO regions. |
| |
| Types of regions |
| ---------------- |
| |
| There are four types of memory regions (all represented by a single C type |
| MemoryRegion): |
| |
| - RAM: a RAM region is simply a range of host memory that can be made available |
| to the guest. |
| |
| - MMIO: a range of guest memory that is implemented by host callbacks; |
| each read or write causes a callback to be called on the host. |
| |
| - container: a container simply includes other memory regions, each at |
| a different offset. Containers are useful for grouping several regions |
| into one unit. For example, a PCI BAR may be composed of a RAM region |
| and an MMIO region. |
| |
| A container's subregions are usually non-overlapping. In some cases it is |
| useful to have overlapping regions; for example a memory controller that |
| can overlay a subregion of RAM with MMIO or ROM, or a PCI controller |
| that does not prevent card from claiming overlapping BARs. |
| |
| - alias: a subsection of another region. Aliases allow a region to be |
| split apart into discontiguous regions. Examples of uses are memory banks |
| used when the guest address space is smaller than the amount of RAM |
| addressed, or a memory controller that splits main memory to expose a "PCI |
| hole". Aliases may point to any type of region, including other aliases, |
| but an alias may not point back to itself, directly or indirectly. |
| |
| |
| Region names |
| ------------ |
| |
| Regions are assigned names by the constructor. For most regions these are |
| only used for debugging purposes, but RAM regions also use the name to identify |
| live migration sections. This means that RAM region names need to have ABI |
| stability. |
| |
| Region lifecycle |
| ---------------- |
| |
| A region is created by one of the constructor functions (memory_region_init*()) |
| and destroyed by the destructor (memory_region_destroy()). In between, |
| a region can be added to an address space by using memory_region_add_subregion() |
| and removed using memory_region_del_subregion(). Region attributes may be |
| changed at any point; they take effect once the region becomes exposed to the |
| guest. |
| |
| Overlapping regions and priority |
| -------------------------------- |
| Usually, regions may not overlap each other; a memory address decodes into |
| exactly one target. In some cases it is useful to allow regions to overlap, |
| and sometimes to control which of an overlapping regions is visible to the |
| guest. This is done with memory_region_add_subregion_overlap(), which |
| allows the region to overlap any other region in the same container, and |
| specifies a priority that allows the core to decide which of two regions at |
| the same address are visible (highest wins). |
| |
| Visibility |
| ---------- |
| The memory core uses the following rules to select a memory region when the |
| guest accesses an address: |
| |
| - all direct subregions of the root region are matched against the address, in |
| descending priority order |
| - if the address lies outside the region offset/size, the subregion is |
| discarded |
| - if the subregion is a leaf (RAM or MMIO), the search terminates |
| - if the subregion is a container, the same algorithm is used within the |
| subregion (after the address is adjusted by the subregion offset) |
| - if the subregion is an alias, the search is continues at the alias target |
| (after the address is adjusted by the subregion offset and alias offset) |
| |
| Example memory map |
| ------------------ |
| |
| system_memory: container@0-2^48-1 |
| | |
| +---- lomem: alias@0-0xdfffffff ---> #ram (0-0xdfffffff) |
| | |
| +---- himem: alias@0x100000000-0x11fffffff ---> #ram (0xe0000000-0xffffffff) |
| | |
| +---- vga-window: alias@0xa0000-0xbfffff ---> #pci (0xa0000-0xbffff) |
| | (prio 1) |
| | |
| +---- pci-hole: alias@0xe0000000-0xffffffff ---> #pci (0xe0000000-0xffffffff) |
| |
| pci (0-2^32-1) |
| | |
| +--- vga-area: container@0xa0000-0xbffff |
| | | |
| | +--- alias@0x00000-0x7fff ---> #vram (0x010000-0x017fff) |
| | | |
| | +--- alias@0x08000-0xffff ---> #vram (0x020000-0x027fff) |
| | |
| +---- vram: ram@0xe1000000-0xe1ffffff |
| | |
| +---- vga-mmio: mmio@0xe2000000-0xe200ffff |
| |
| ram: ram@0x00000000-0xffffffff |
| |
| This is a (simplified) PC memory map. The 4GB RAM block is mapped into the |
| system address space via two aliases: "lomem" is a 1:1 mapping of the first |
| 3.5GB; "himem" maps the last 0.5GB at address 4GB. This leaves 0.5GB for the |
| so-called PCI hole, that allows a 32-bit PCI bus to exist in a system with |
| 4GB of memory. |
| |
| The memory controller diverts addresses in the range 640K-768K to the PCI |
| address space. This is modelled using the "vga-window" alias, mapped at a |
| higher priority so it obscures the RAM at the same addresses. The vga window |
| can be removed by programming the memory controller; this is modelled by |
| removing the alias and exposing the RAM underneath. |
| |
| The pci address space is not a direct child of the system address space, since |
| we only want parts of it to be visible (we accomplish this using aliases). |
| It has two subregions: vga-area models the legacy vga window and is occupied |
| by two 32K memory banks pointing at two sections of the framebuffer. |
| In addition the vram is mapped as a BAR at address e1000000, and an additional |
| BAR containing MMIO registers is mapped after it. |
| |
| Note that if the guest maps a BAR outside the PCI hole, it would not be |
| visible as the pci-hole alias clips it to a 0.5GB range. |
| |
| Attributes |
| ---------- |
| |
| Various region attributes (read-only, dirty logging, coalesced mmio, ioeventfd) |
| can be changed during the region lifecycle. They take effect once the region |
| is made visible (which can be immediately, later, or never). |
| |
| MMIO Operations |
| --------------- |
| |
| MMIO regions are provided with ->read() and ->write() callbacks; in addition |
| various constraints can be supplied to control how these callbacks are called: |
| |
| - .valid.min_access_size, .valid.max_access_size define the access sizes |
| (in bytes) which the device accepts; accesses outside this range will |
| have device and bus specific behaviour (ignored, or machine check) |
| - .valid.aligned specifies that the device only accepts naturally aligned |
| accesses. Unaligned accesses invoke device and bus specific behaviour. |
| - .impl.min_access_size, .impl.max_access_size define the access sizes |
| (in bytes) supported by the *implementation*; other access sizes will be |
| emulated using the ones available. For example a 4-byte write will be |
| emulated using four 1-byte writes, if .impl.max_access_size = 1. |
| - .impl.valid specifies that the *implementation* only supports unaligned |
| accesses; unaligned accesses will be emulated by two aligned accesses. |
| - .old_portio and .old_mmio can be used to ease porting from code using |
| cpu_register_io_memory() and register_ioport(). They should not be used |
| in new code. |