Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 1 | PCI EXPRESS GUIDELINES |
| 2 | ====================== |
| 3 | |
| 4 | 1. Introduction |
| 5 | ================ |
Kashyap Chamarthy | c894592 | 2018-02-19 17:31:31 +0100 | [diff] [blame] | 6 | The doc proposes best practices on how to use PCI Express (PCIe) / PCI |
| 7 | devices in PCI Express based machines and explains the reasoning behind |
| 8 | them. |
| 9 | |
| 10 | Note that the PCIe features are available only when using the 'q35' |
| 11 | machine type on x86 architecture and the 'virt' machine type on AArch64. |
| 12 | Other machine types do not use PCIe at this time. |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 13 | |
| 14 | The following presentations accompany this document: |
| 15 | (1) Q35 overview. |
Stefan Hajnoczi | 70b7fba | 2017-11-21 12:04:35 +0000 | [diff] [blame] | 16 | https://wiki.qemu.org/images/4/4e/Q35.pdf |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 17 | (2) A comparison between PCI and PCI Express technologies. |
Stefan Hajnoczi | 70b7fba | 2017-11-21 12:04:35 +0000 | [diff] [blame] | 18 | https://wiki.qemu.org/images/f/f6/PCIvsPCIe.pdf |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 19 | |
| 20 | Note: The usage examples are not intended to replace the full |
| 21 | documentation, please use QEMU help to retrieve all options. |
| 22 | |
| 23 | 2. Device placement strategy |
| 24 | ============================ |
| 25 | QEMU does not have a clear socket-device matching mechanism |
| 26 | and allows any PCI/PCI Express device to be plugged into any |
| 27 | PCI/PCI Express slot. |
| 28 | Plugging a PCI device into a PCI Express slot might not always work and |
| 29 | is weird anyway since it cannot be done for "bare metal". |
| 30 | Plugging a PCI Express device into a PCI slot will hide the Extended |
| 31 | Configuration Space thus is also not recommended. |
| 32 | |
| 33 | The recommendation is to separate the PCI Express and PCI hierarchies. |
| 34 | PCI Express devices should be plugged only into PCI Express Root Ports and |
| 35 | PCI Express Downstream ports. |
| 36 | |
| 37 | 2.1 Root Bus (pcie.0) |
| 38 | ===================== |
| 39 | Place only the following kinds of devices directly on the Root Complex: |
| 40 | (1) PCI Devices (e.g. network card, graphics card, IDE controller), |
| 41 | not controllers. Place only legacy PCI devices on |
| 42 | the Root Complex. These will be considered Integrated Endpoints. |
| 43 | Note: Integrated Endpoints are not hot-pluggable. |
| 44 | |
| 45 | Although the PCI Express spec does not forbid PCI Express devices as |
| 46 | Integrated Endpoints, existing hardware mostly integrates legacy PCI |
| 47 | devices with the Root Complex. Guest OSes are suspected to behave |
| 48 | strangely when PCI Express devices are integrated |
| 49 | with the Root Complex. |
| 50 | |
| 51 | (2) PCI Express Root Ports (ioh3420), for starting exclusively PCI Express |
| 52 | hierarchies. |
| 53 | |
Aleksandr Bezzubikov | c1800a1 | 2017-08-18 02:36:50 +0300 | [diff] [blame] | 54 | (3) PCI Express to PCI Bridge (pcie-pci-bridge), for starting legacy PCI |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 55 | hierarchies. |
| 56 | |
| 57 | (4) Extra Root Complexes (pxb-pcie), if multiple PCI Express Root Buses |
| 58 | are needed. |
| 59 | |
| 60 | pcie.0 bus |
| 61 | ---------------------------------------------------------------------------- |
| 62 | | | | | |
Aleksandr Bezzubikov | c1800a1 | 2017-08-18 02:36:50 +0300 | [diff] [blame] | 63 | ----------- ------------------ ------------------- -------------- |
| 64 | | PCI Dev | | PCIe Root Port | | PCIe-PCI Bridge | | pxb-pcie | |
| 65 | ----------- ------------------ ------------------- -------------- |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 66 | |
| 67 | 2.1.1 To plug a device into pcie.0 as a Root Complex Integrated Endpoint use: |
| 68 | -device <dev>[,bus=pcie.0] |
| 69 | 2.1.2 To expose a new PCI Express Root Bus use: |
| 70 | -device pxb-pcie,id=pcie.1,bus_nr=x[,numa_node=y][,addr=z] |
Aleksandr Bezzubikov | c1800a1 | 2017-08-18 02:36:50 +0300 | [diff] [blame] | 71 | PCI Express Root Ports and PCI Express to PCI bridges can be |
| 72 | connected to the pcie.1 bus: |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 73 | -device ioh3420,id=root_port1[,bus=pcie.1][,chassis=x][,slot=y][,addr=z] \ |
Aleksandr Bezzubikov | c1800a1 | 2017-08-18 02:36:50 +0300 | [diff] [blame] | 74 | -device pcie-pci-bridge,id=pcie_pci_bridge1,bus=pcie.1 |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 75 | |
| 76 | |
| 77 | 2.2 PCI Express only hierarchy |
| 78 | ============================== |
| 79 | Always use PCI Express Root Ports to start PCI Express hierarchies. |
| 80 | |
| 81 | A PCI Express Root bus supports up to 32 devices. Since each |
| 82 | PCI Express Root Port is a function and a multi-function |
| 83 | device may support up to 8 functions, the maximum possible |
| 84 | number of PCI Express Root Ports per PCI Express Root Bus is 256. |
| 85 | |
| 86 | Prefer grouping PCI Express Root Ports into multi-function devices |
| 87 | to keep a simple flat hierarchy that is enough for most scenarios. |
| 88 | Only use PCI Express Switches (x3130-upstream, xio3130-downstream) |
| 89 | if there is no more room for PCI Express Root Ports. |
| 90 | Please see section 4. for further justifications. |
| 91 | |
| 92 | Plug only PCI Express devices into PCI Express Ports. |
| 93 | |
| 94 | |
| 95 | pcie.0 bus |
| 96 | ---------------------------------------------------------------------------------- |
| 97 | | | | |
| 98 | ------------- ------------- ------------- |
| 99 | | Root Port | | Root Port | | Root Port | |
| 100 | ------------ ------------- ------------- |
| 101 | | -------------------------|------------------------ |
| 102 | ------------ | ----------------- | |
| 103 | | PCIe Dev | | PCI Express | Upstream Port | | |
| 104 | ------------ | Switch ----------------- | |
| 105 | | | | | |
| 106 | | ------------------- ------------------- | |
| 107 | | | Downstream Port | | Downstream Port | | |
| 108 | | ------------------- ------------------- | |
| 109 | -------------|-----------------------|------------ |
| 110 | ------------ |
| 111 | | PCIe Dev | |
| 112 | ------------ |
| 113 | |
| 114 | 2.2.1 Plugging a PCI Express device into a PCI Express Root Port: |
| 115 | -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] \ |
| 116 | -device <dev>,bus=root_port1 |
| 117 | 2.2.2 Using multi-function PCI Express Root Ports: |
Cao jin | 2e41dfe | 2016-12-29 09:19:37 +0800 | [diff] [blame] | 118 | -device ioh3420,id=root_port1,multifunction=on,chassis=x,addr=z.0[,slot=y][,bus=pcie.0] \ |
| 119 | -device ioh3420,id=root_port2,chassis=x1,addr=z.1[,slot=y1][,bus=pcie.0] \ |
| 120 | -device ioh3420,id=root_port3,chassis=x2,addr=z.2[,slot=y2][,bus=pcie.0] \ |
| 121 | 2.2.3 Plugging a PCI Express device into a Switch: |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 122 | -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z] \ |
| 123 | -device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x] \ |
| 124 | -device xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1,slot=y1[,addr=z1]] \ |
| 125 | -device <dev>,bus=downstream_port1 |
| 126 | |
| 127 | Notes: |
Cao jin | 2e41dfe | 2016-12-29 09:19:37 +0800 | [diff] [blame] | 128 | - (slot, chassis) pair is mandatory and must be unique for each |
| 129 | PCI Express Root Port. slot defaults to 0 when not specified. |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 130 | - 'addr' parameter can be 0 for all the examples above. |
| 131 | |
| 132 | |
| 133 | 2.3 PCI only hierarchy |
| 134 | ====================== |
| 135 | Legacy PCI devices can be plugged into pcie.0 as Integrated Endpoints, |
| 136 | but, as mentioned in section 5, doing so means the legacy PCI |
| 137 | device in question will be incapable of hot-unplugging. |
Aleksandr Bezzubikov | c1800a1 | 2017-08-18 02:36:50 +0300 | [diff] [blame] | 138 | Besides that use PCI Express to PCI Bridges (pcie-pci-bridge) in |
| 139 | combination with PCI-PCI Bridges (pci-bridge) to start PCI hierarchies. |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 140 | |
Aleksandr Bezzubikov | c1800a1 | 2017-08-18 02:36:50 +0300 | [diff] [blame] | 141 | Prefer flat hierarchies. For most scenarios a single PCI Express to PCI Bridge |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 142 | (having 32 slots) and several PCI-PCI Bridges attached to it |
| 143 | (each supporting also 32 slots) will support hundreds of legacy devices. |
Aleksandr Bezzubikov | c1800a1 | 2017-08-18 02:36:50 +0300 | [diff] [blame] | 144 | The recommendation is to populate one PCI-PCI Bridge under the |
| 145 | PCI Express to PCI Bridge until is full and then plug a new PCI-PCI Bridge... |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 146 | |
| 147 | pcie.0 bus |
| 148 | ---------------------------------------------- |
| 149 | | | |
Aleksandr Bezzubikov | c1800a1 | 2017-08-18 02:36:50 +0300 | [diff] [blame] | 150 | ----------- ------------------- |
| 151 | | PCI Dev | | PCIe-PCI Bridge | |
| 152 | ----------- ------------------- |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 153 | | | |
| 154 | ------------------ ------------------ |
Aleksandr Bezzubikov | c1800a1 | 2017-08-18 02:36:50 +0300 | [diff] [blame] | 155 | | PCI-PCI Bridge | | PCI-PCI Bridge | |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 156 | ------------------ ------------------ |
| 157 | | | |
| 158 | ----------- ----------- |
| 159 | | PCI Dev | | PCI Dev | |
| 160 | ----------- ----------- |
| 161 | |
| 162 | 2.3.1 To plug a PCI device into pcie.0 as an Integrated Endpoint use: |
| 163 | -device <dev>[,bus=pcie.0] |
| 164 | 2.3.2 Plugging a PCI device into a PCI-PCI Bridge: |
Aleksandr Bezzubikov | c1800a1 | 2017-08-18 02:36:50 +0300 | [diff] [blame] | 165 | -device pcie-pci-bridge,id=pcie_pci_bridge1[,bus=pcie.0] \ |
| 166 | -device pci-bridge,id=pci_bridge1,bus=pcie_pci_bridge1[,chassis_nr=x][,addr=y] \ |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 167 | -device <dev>,bus=pci_bridge1[,addr=x] |
| 168 | Note that 'addr' cannot be 0 unless shpc=off parameter is passed to |
Aleksandr Bezzubikov | c1800a1 | 2017-08-18 02:36:50 +0300 | [diff] [blame] | 169 | the PCI Bridge/PCI Express to PCI Bridge. |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 170 | |
| 171 | 3. IO space issues |
| 172 | =================== |
| 173 | The PCI Express Root Ports and PCI Express Downstream ports are seen by |
| 174 | Firmware/Guest OS as PCI-PCI Bridges. As required by the PCI spec, each |
| 175 | such Port should be reserved a 4K IO range for, even though only one |
| 176 | (multifunction) device can be plugged into each Port. This results in |
| 177 | poor IO space utilization. |
| 178 | |
| 179 | The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations |
| 180 | by not allocating IO space for each PCI Express Root / PCI Express |
| 181 | Downstream port if: |
| 182 | (1) the port is empty, or |
| 183 | (2) the device behind the port has no IO BARs. |
| 184 | |
| 185 | The IO space is very limited, to 65536 byte-wide IO ports, and may even be |
| 186 | fragmented by fixed IO ports owned by platform devices resulting in at most |
| 187 | 10 PCI Express Root Ports or PCI Express Downstream Ports per system |
| 188 | if devices with IO BARs are used in the PCI Express hierarchy. Using the |
| 189 | proposed device placing strategy solves this issue by using only |
| 190 | PCI Express devices within PCI Express hierarchy. |
| 191 | |
| 192 | The PCI Express spec requires that PCI Express devices work properly |
| 193 | without using IO ports. The PCI hierarchy has no such limitations. |
| 194 | |
| 195 | |
| 196 | 4. Bus numbers issues |
| 197 | ====================== |
| 198 | Each PCI domain can have up to only 256 buses and the QEMU PCI Express |
| 199 | machines do not support multiple PCI domains even if extra Root |
| 200 | Complexes (pxb-pcie) are used. |
| 201 | |
| 202 | Each element of the PCI Express hierarchy (Root Complexes, |
| 203 | PCI Express Root Ports, PCI Express Downstream/Upstream ports) |
| 204 | uses one bus number. Since only one (multifunction) device |
| 205 | can be attached to a PCI Express Root Port or PCI Express Downstream |
| 206 | Port it is advised to plan in advance for the expected number of |
| 207 | devices to prevent bus number starvation. |
| 208 | |
| 209 | Avoiding PCI Express Switches (and thereby striving for a 'flatter' PCI |
| 210 | Express hierarchy) enables the hierarchy to not spend bus numbers on |
| 211 | Upstream Ports. |
| 212 | |
| 213 | The bus_nr properties of the pxb-pcie devices partition the 0..255 bus |
| 214 | number space. All bus numbers assigned to the buses recursively behind a |
| 215 | given pxb-pcie device's root bus must fit between the bus_nr property of |
| 216 | that pxb-pcie device, and the lowest of the higher bus_nr properties |
| 217 | that the command line sets for other pxb-pcie devices. |
| 218 | |
| 219 | |
| 220 | 5. Hot-plug |
| 221 | ============ |
| 222 | The PCI Express root buses (pcie.0 and the buses exposed by pxb-pcie devices) |
| 223 | do not support hot-plug, so any devices plugged into Root Complexes |
| 224 | cannot be hot-plugged/hot-unplugged: |
| 225 | (1) PCI Express Integrated Endpoints |
| 226 | (2) PCI Express Root Ports |
Aleksandr Bezzubikov | c1800a1 | 2017-08-18 02:36:50 +0300 | [diff] [blame] | 227 | (3) PCI Express to PCI Bridges |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 228 | (4) pxb-pcie |
| 229 | |
| 230 | Be aware that PCI Express Downstream Ports can't be hot-plugged into |
| 231 | an existing PCI Express Upstream Port. |
| 232 | |
Aleksandr Bezzubikov | c1800a1 | 2017-08-18 02:36:50 +0300 | [diff] [blame] | 233 | PCI devices can be hot-plugged into PCI Express to PCI and PCI-PCI Bridges. |
| 234 | The PCI hot-plug into PCI-PCI bridge is ACPI based, whereas hot-plug into |
| 235 | PCI Express to PCI bridges is SHPC-based. They both can work side by side with |
| 236 | the PCI Express native hot-plug. |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 237 | |
| 238 | PCI Express devices can be natively hot-plugged/hot-unplugged into/from |
| 239 | PCI Express Root Ports (and PCI Express Downstream Ports). |
| 240 | |
| 241 | 5.1 Planning for hot-plug: |
| 242 | (1) PCI hierarchy |
| 243 | Leave enough PCI-PCI Bridge slots empty or add one |
Aleksandr Bezzubikov | c1800a1 | 2017-08-18 02:36:50 +0300 | [diff] [blame] | 244 | or more empty PCI-PCI Bridges to the PCI Express to PCI Bridge. |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 245 | |
| 246 | For each such PCI-PCI Bridge the Guest Firmware is expected to reserve |
| 247 | 4K IO space and 2M MMIO range to be used for all devices behind it. |
Aleksandr Bezzubikov | c1800a1 | 2017-08-18 02:36:50 +0300 | [diff] [blame] | 248 | Appropriate PCI capability is designed, see pcie_pci_bridge.txt. |
Marcel Apfelbaum | 453ac88 | 2016-11-01 15:39:47 +0200 | [diff] [blame] | 249 | |
| 250 | Because of the hard IO limit of around 10 PCI Bridges (~ 40K space) |
| 251 | per system don't use more than 9 PCI-PCI Bridges, leaving 4K for the |
| 252 | Integrated Endpoints. (The PCI Express Hierarchy needs no IO space). |
| 253 | |
| 254 | (2) PCI Express hierarchy: |
| 255 | Leave enough PCI Express Root Ports empty. Use multifunction |
| 256 | PCI Express Root Ports (up to 8 ports per pcie.0 slot) |
| 257 | on the Root Complex(es), for keeping the |
| 258 | hierarchy as flat as possible, thereby saving PCI bus numbers. |
| 259 | Don't use PCI Express Switches if you don't have |
| 260 | to, each one of those uses an extra PCI bus (for its Upstream Port) |
| 261 | that could be put to better use with another Root Port or Downstream |
| 262 | Port, which may come handy for hot-plugging another device. |
| 263 | |
| 264 | |
| 265 | 5.3 Hot-plug example: |
| 266 | Using HMP: (add -monitor stdio to QEMU command line) |
| 267 | device_add <dev>,id=<id>,bus=<PCI Express Root Port Id/PCI Express Downstream Port Id/PCI-PCI Bridge Id/> |
| 268 | |
| 269 | |
| 270 | 6. Device assignment |
| 271 | ==================== |
| 272 | Host devices are mostly PCI Express and should be plugged only into |
| 273 | PCI Express Root Ports or PCI Express Downstream Ports. |
| 274 | PCI-PCI Bridge slots can be used for legacy PCI host devices. |
| 275 | |
| 276 | 6.1 How to detect if a device is PCI Express: |
| 277 | > lspci -s 03:00.0 -v (as root) |
| 278 | |
| 279 | 03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83) |
| 280 | Subsystem: Intel Corporation Dual Band Wireless-AC 7260 |
| 281 | Flags: bus master, fast devsel, latency 0, IRQ 50 |
| 282 | Memory at f0400000 (64-bit, non-prefetchable) [size=8K] |
| 283 | Capabilities: [c8] Power Management version 3 |
| 284 | Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ |
| 285 | Capabilities: [40] Express Endpoint, MSI 00 |
| 286 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 287 | Capabilities: [100] Advanced Error Reporting |
| 288 | Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20 |
| 289 | Capabilities: [14c] Latency Tolerance Reporting |
| 290 | Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014 |
| 291 | |
| 292 | If you can see the "Express Endpoint" capability in the |
| 293 | output, then the device is indeed PCI Express. |
| 294 | |
| 295 | |
| 296 | 7. Virtio devices |
| 297 | ================= |
| 298 | Virtio devices plugged into the PCI hierarchy or as Integrated Endpoints |
| 299 | will remain PCI and have transitional behaviour as default. |
| 300 | Transitional virtio devices work in both IO and MMIO modes depending on |
| 301 | the guest support. The Guest firmware will assign both IO and MMIO resources |
| 302 | to transitional virtio devices. |
| 303 | |
| 304 | Virtio devices plugged into PCI Express ports are PCI Express devices and |
| 305 | have "1.0" behavior by default without IO support. |
| 306 | In both cases disable-legacy and disable-modern properties can be used |
| 307 | to override the behaviour. |
| 308 | |
| 309 | Note that setting disable-legacy=off will enable legacy mode (enabling |
| 310 | legacy behavior) for PCI Express virtio devices causing them to |
| 311 | require IO space, which, given the limited available IO space, may quickly |
| 312 | lead to resource exhaustion, and is therefore strongly discouraged. |
| 313 | |
| 314 | |
| 315 | 8. Conclusion |
| 316 | ============== |
| 317 | The proposal offers a usage model that is easy to understand and follow |
| 318 | and at the same time overcomes the PCI Express architecture limitations. |