Cédric Le Goater | 24563a5 | 2019-05-21 10:24:11 +0200 | [diff] [blame] | 1 | ================================ |
| 2 | POWER9 XIVE interrupt controller |
| 3 | ================================ |
| 4 | |
| 5 | The POWER9 processor comes with a new interrupt controller |
| 6 | architecture, called XIVE as "eXternal Interrupt Virtualization |
| 7 | Engine". |
| 8 | |
| 9 | Compared to the previous architecture, the main characteristics of |
| 10 | XIVE are to support a larger number of interrupt sources and to |
| 11 | deliver interrupts directly to virtual processors without hypervisor |
| 12 | assistance. This removes the context switches required for the |
| 13 | delivery process. |
| 14 | |
| 15 | |
| 16 | XIVE architecture |
| 17 | ================= |
| 18 | |
| 19 | The XIVE IC is composed of three sub-engines, each taking care of a |
| 20 | processing layer of external interrupts: |
| 21 | |
| 22 | - Interrupt Virtualization Source Engine (IVSE), or Source Controller |
Cédric Le Goater | b87a010 | 2019-06-12 18:04:25 +0200 | [diff] [blame] | 23 | (SC). These are found in PCI PHBs, in the Processor Service |
| 24 | Interface (PSI) host bridge Controller, but also inside the main |
| 25 | controller for the core IPIs and other sub-chips (NX, CAP, NPU) of |
| 26 | the chip/processor. They are configured to feed the IVRE with |
| 27 | events. |
Cédric Le Goater | 24563a5 | 2019-05-21 10:24:11 +0200 | [diff] [blame] | 28 | - Interrupt Virtualization Routing Engine (IVRE) or Virtualization |
| 29 | Controller (VC). It handles event coalescing and perform interrupt |
| 30 | routing by matching an event source number with an Event |
| 31 | Notification Descriptor (END). |
| 32 | - Interrupt Virtualization Presentation Engine (IVPE) or Presentation |
| 33 | Controller (PC). It maintains the interrupt context state of each |
| 34 | thread and handles the delivery of the external interrupt to the |
| 35 | thread. |
| 36 | |
| 37 | :: |
| 38 | |
| 39 | XIVE Interrupt Controller |
| 40 | +------------------------------------+ IPIs |
| 41 | | +---------+ +---------+ +--------+ | +-------+ |
| 42 | | |IVRE | |Common Q | |IVPE |----> | CORES | |
| 43 | | | esb | | | | |----> | | |
| 44 | | | eas | | Bridge | | tctx |----> | | |
| 45 | | |SC end | | | | nvt | | | | |
| 46 | +------+ | +---------+ +----+----+ +--------+ | +-+-+-+-+ |
| 47 | | RAM | +------------------|-----------------+ | | | |
| 48 | | | | | | | |
| 49 | | | | | | | |
| 50 | | | +--------------------v------------------------v-v-v--+ other |
| 51 | | <--+ Power Bus +--> chips |
| 52 | | esb | +---------+-----------------------+------------------+ |
| 53 | | eas | | | |
| 54 | | end | +--|------+ | |
| 55 | | nvt | +----+----+ | +----+----+ |
| 56 | +------+ |IVSE | | |IVSE | |
| 57 | | | | | | |
| 58 | | PQ-bits | | | PQ-bits | |
| 59 | | local |-+ | in VC | |
| 60 | +---------+ +---------+ |
| 61 | PCIe NX,NPU,CAPI |
| 62 | |
| 63 | |
| 64 | PQ-bits: 2 bits source state machine (P:pending Q:queued) |
| 65 | esb: Event State Buffer (Array of PQ bits in an IVSE) |
| 66 | eas: Event Assignment Structure |
| 67 | end: Event Notification Descriptor |
| 68 | nvt: Notification Virtual Target |
| 69 | tctx: Thread interrupt Context registers |
| 70 | |
| 71 | |
| 72 | |
| 73 | XIVE internal tables |
| 74 | -------------------- |
| 75 | |
| 76 | Each of the sub-engines uses a set of tables to redirect interrupts |
| 77 | from event sources to CPU threads. |
| 78 | |
| 79 | :: |
| 80 | |
| 81 | +-------+ |
| 82 | User or O/S | EQ | |
| 83 | or +------>|entries| |
| 84 | Hypervisor | | .. | |
| 85 | Memory | +-------+ |
| 86 | | ^ |
| 87 | | | |
| 88 | +-------------------------------------------------+ |
| 89 | | | |
| 90 | Hypervisor +------+ +---+--+ +---+--+ +------+ |
| 91 | Memory | ESB | | EAT | | ENDT | | NVTT | |
| 92 | (skiboot) +----+-+ +----+-+ +----+-+ +------+ |
| 93 | ^ | ^ | ^ | ^ |
| 94 | | | | | | | | |
| 95 | +-------------------------------------------------+ |
| 96 | | | | | | | | |
| 97 | | | | | | | | |
| 98 | +----|--|--------|--|--------|--|-+ +-|-----+ +------+ |
| 99 | | | | | | | | | | | tctx| |Thread| |
| 100 | IPI or ---+ + v + v + v |---| + .. |-----> | |
| 101 | HW events | | | | | | |
| 102 | | IVRE | | IVPE | +------+ |
| 103 | +---------------------------------+ +-------+ |
| 104 | |
| 105 | |
| 106 | The IVSE have a 2-bits state machine, P for pending and Q for queued, |
| 107 | for each source that allows events to be triggered. They are stored in |
| 108 | an Event State Buffer (ESB) array and can be controlled by MMIOs. |
| 109 | |
| 110 | If the event is let through, the IVRE looks up in the Event Assignment |
| 111 | Structure (EAS) table for an Event Notification Descriptor (END) |
| 112 | configured for the source. Each Event Notification Descriptor defines |
| 113 | a notification path to a CPU and an in-memory Event Queue, in which |
| 114 | will be enqueued an EQ data for the O/S to pull. |
| 115 | |
| 116 | The IVPE determines if a Notification Virtual Target (NVT) can handle |
| 117 | the event by scanning the thread contexts of the VCPUs dispatched on |
| 118 | the processor HW threads. It maintains the interrupt context state of |
| 119 | each thread in a NVT table. |
| 120 | |
| 121 | XIVE thread interrupt context |
| 122 | ----------------------------- |
| 123 | |
| 124 | The XIVE presenter can generate four different exceptions to its |
| 125 | HW threads: |
| 126 | |
| 127 | - hypervisor exception |
| 128 | - O/S exception |
| 129 | - Event-Based Branch (user level) |
| 130 | - msgsnd (doorbell) |
| 131 | |
| 132 | Each exception has a state independent from the others called a Thread |
| 133 | Interrupt Management context. This context is a set of registers which |
| 134 | lets the thread handle priority management and interrupt |
| 135 | acknowledgment among other things. The most important ones being : |
| 136 | |
| 137 | - Interrupt Priority Register (PIPR) |
| 138 | - Interrupt Pending Buffer (IPB) |
| 139 | - Current Processor Priority (CPPR) |
| 140 | - Notification Source Register (NSR) |
| 141 | |
| 142 | TIMA |
| 143 | ~~~~ |
| 144 | |
| 145 | The Thread Interrupt Management registers are accessible through a |
| 146 | specific MMIO region, called the Thread Interrupt Management Area |
| 147 | (TIMA), four aligned pages, each exposing a different view of the |
| 148 | registers. First page (page address ending in ``0b00``) gives access |
| 149 | to the entire context and is reserved for the ring 0 view for the |
| 150 | physical thread context. The second (page address ending in ``0b01``) |
| 151 | is for the hypervisor, ring 1 view. The third (page address ending in |
| 152 | ``0b10``) is for the operating system, ring 2 view. The fourth (page |
| 153 | address ending in ``0b11``) is for user level, ring 3 view. |
| 154 | |
| 155 | Interrupt flow from an O/S perspective |
| 156 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 157 | |
| 158 | After an event data has been enqueued in the O/S Event Queue, the IVPE |
| 159 | raises the bit corresponding to the priority of the pending interrupt |
| 160 | in the register IBP (Interrupt Pending Buffer) to indicate that an |
| 161 | event is pending in one of the 8 priority queues. The Pending |
| 162 | Interrupt Priority Register (PIPR) is also updated using the IPB. This |
| 163 | register represent the priority of the most favored pending |
| 164 | notification. |
| 165 | |
Dr. David Alan Gilbert | df59feb | 2019-11-04 18:52:02 +0000 | [diff] [blame] | 166 | The PIPR is then compared to the Current Processor Priority |
Cédric Le Goater | 24563a5 | 2019-05-21 10:24:11 +0200 | [diff] [blame] | 167 | Register (CPPR). If it is more favored (numerically less than), the |
| 168 | CPU interrupt line is raised and the EO bit of the Notification Source |
| 169 | Register (NSR) is updated to notify the presence of an exception for |
| 170 | the O/S. The O/S acknowledges the interrupt with a special load in the |
| 171 | Thread Interrupt Management Area. |
| 172 | |
| 173 | The O/S handles the interrupt and when done, performs an EOI using a |
| 174 | MMIO operation on the ESB management page of the associate source. |
| 175 | |
| 176 | Overview of the QEMU models for XIVE |
| 177 | ==================================== |
| 178 | |
| 179 | The XiveSource models the IVSE in general, internal and external. It |
| 180 | handles the source ESBs and the MMIO interface to control them. |
| 181 | |
| 182 | The XiveNotifier is a small helper interface interconnecting the |
| 183 | XiveSource to the XiveRouter. |
| 184 | |
| 185 | The XiveRouter is an abstract model acting as a combined IVRE and |
| 186 | IVPE. It routes event notifications using the EAS and END tables to |
| 187 | the IVPE sub-engine which does a CAM scan to find a CPU to deliver the |
| 188 | exception. Storage should be provided by the inheriting classes. |
| 189 | |
| 190 | XiveEnDSource is a special source object. It exposes the END ESB MMIOs |
| 191 | of the Event Queues which are used for coalescing event notifications |
| 192 | and for escalation. Not used on the field, only to sync the EQ cache |
| 193 | in OPAL. |
| 194 | |
| 195 | Finally, the XiveTCTX contains the interrupt state context of a thread, |
| 196 | four sets of registers, one for each exception that can be delivered |
| 197 | to a CPU. These contexts are scanned by the IVPE to find a matching VP |
| 198 | when a notification is triggered. It also models the Thread Interrupt |
| 199 | Management Area (TIMA), which exposes the thread context registers to |
| 200 | the CPU for interrupt management. |