| PowerNV family boards (``powernv8``, ``powernv9``) |
| ================================================================== |
| |
| PowerNV (as Non-Virtualized) is the "baremetal" platform using the |
| OPAL firmware. It runs Linux on IBM and OpenPOWER systems and it can |
| be used as an hypervisor OS, running KVM guests, or simply as a host |
| OS. |
| |
| The PowerNV QEMU machine tries to emulate a PowerNV system at the |
| level of the skiboot firmware, which loads the OS and provides some |
| runtime services. Power Systems have a lower firmware (HostBoot) that |
| does low level system initialization, like DRAM training. This is |
| beyond the scope of what QEMU addresses today. |
| |
| Supported devices |
| ----------------- |
| |
| * Multi processor support for POWER8, POWER8NVL and POWER9. |
| * XSCOM, serial communication sideband bus to configure chiplets |
| * Simple LPC Controller |
| * Processor Service Interface (PSI) Controller |
| * Interrupt Controller, XICS (POWER8) and XIVE (POWER9) |
| * POWER8 PHB3 PCIe Host bridge and POWER9 PHB4 PCIe Host bridge |
| * Simple OCC is an on-chip microcontroller used for power management |
| tasks |
| * iBT device to handle BMC communication, with the internal BMC |
| simulator provided by QEMU or an external BMC such as an Aspeed |
| QEMU machine. |
| * PNOR containing the different firmware partitions. |
| |
| Missing devices |
| --------------- |
| |
| A lot is missing, among which : |
| |
| * POWER10 processor |
| * XIVE2 (POWER10) interrupt controller |
| * I2C controllers (yet to be merged) |
| * NPU/NPU2/NPU3 controllers |
| * EEH support for PCIe Host bridge controllers |
| * NX controller |
| * VAS controller |
| * chipTOD (Time Of Day) |
| * Self Boot Engine (SBE). |
| * FSI bus |
| |
| Firmware |
| -------- |
| |
| The OPAL firmware (OpenPower Abstraction Layer) for OpenPower systems |
| includes the runtime services `skiboot` and the bootloader kernel and |
| initramfs `skiroot`. Source code can be found on GitHub: |
| |
| https://github.com/open-power. |
| |
| Prebuilt images of `skiboot` and `skiboot` are made available on the `OpenPOWER <https://openpower.xyz/job/openpower/job/openpower-op-build/>`__ site. To boot a POWER9 machine, use the `witherspoon <https://openpower.xyz/job/openpower/job/openpower-op-build/label=slave,target=witherspoon/lastSuccessfulBuild/>`__ images. For POWER8, use |
| the `palmetto <https://openpower.xyz/job/openpower/job/openpower-op-build/label=slave,target=palmetto/lastSuccessfulBuild/>`__ images. |
| |
| QEMU includes a prebuilt image of `skiboot` which is updated when a |
| more recent version is required by the models. |
| |
| Boot options |
| ------------ |
| |
| Here is a simple setup with one e1000e NIC : |
| |
| .. code-block:: bash |
| |
| $ qemu-system-ppc64 -m 2G -machine powernv9 -smp 2,cores=2,threads=1 \ |
| -accel tcg,thread=single \ |
| -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0 \ |
| -netdev user,id=net0,hostfwd=::20022-:22,hostname=pnv \ |
| -kernel ./zImage.epapr \ |
| -initrd ./rootfs.cpio.xz \ |
| -nographic |
| |
| and a SATA disk : |
| |
| .. code-block:: bash |
| |
| -device ich9-ahci,id=sata0,bus=pcie.1,addr=0x0 \ |
| -drive file=./ubuntu-ppc64le.qcow2,if=none,id=drive0,format=qcow2,cache=none \ |
| -device ide-hd,bus=sata0.0,unit=0,drive=drive0,id=ide,bootindex=1 \ |
| |
| Complex PCIe configuration |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| Six PHBs are defined per chip (POWER9) but no default PCI layout is |
| provided (to be compatible with libvirt). One PCI device can be added |
| on any of the available PCIe slots using command line options such as: |
| |
| .. code-block:: bash |
| |
| -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=pcie.0,addr=0x0 |
| -netdev bridge,id=net0,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=hostnet0 |
| |
| -device megasas,id=scsi0,bus=pcie.0,addr=0x0 |
| -drive file=./ubuntu-ppc64le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none |
| -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2 |
| |
| Here is a full example with two different storage controllers on |
| different PHBs, each with a disk, the second PHB is empty : |
| |
| .. code-block:: bash |
| |
| $ qemu-system-ppc64 -m 2G -machine powernv9 -smp 2,cores=2,threads=1 -accel tcg,thread=single \ |
| -kernel ./zImage.epapr -initrd ./rootfs.cpio.xz -bios ./skiboot.lid \ |
| \ |
| -device megasas,id=scsi0,bus=pcie.0,addr=0x0 \ |
| -drive file=./rhel7-ppc64le.qcow2,if=none,id=drive-scsi0-0-0-0,format=qcow2,cache=none \ |
| -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2 \ |
| \ |
| -device pcie-pci-bridge,id=bridge1,bus=pcie.1,addr=0x0 \ |
| \ |
| -device ich9-ahci,id=sata0,bus=bridge1,addr=0x1 \ |
| -drive file=./ubuntu-ppc64le.qcow2,if=none,id=drive0,format=qcow2,cache=none \ |
| -device ide-hd,bus=sata0.0,unit=0,drive=drive0,id=ide,bootindex=1 \ |
| -device e1000e,netdev=net0,mac=C0:FF:EE:00:00:02,bus=bridge1,addr=0x2 \ |
| -netdev bridge,helper=/usr/libexec/qemu-bridge-helper,br=virbr0,id=net0 \ |
| -device nec-usb-xhci,bus=bridge1,addr=0x7 \ |
| \ |
| -serial mon:stdio -nographic |
| |
| You can also use VIRTIO devices : |
| |
| .. code-block:: bash |
| |
| -drive file=./fedora-ppc64le.qcow2,if=none,snapshot=on,id=drive0 \ |
| -device virtio-blk-pci,drive=drive0,id=blk0,bus=pcie.0 \ |
| \ |
| -netdev tap,helper=/usr/lib/qemu/qemu-bridge-helper,br=virbr0,id=netdev0 \ |
| -device virtio-net-pci,netdev=netdev0,id=net0,bus=pcie.1 \ |
| \ |
| -fsdev local,id=fsdev0,path=$HOME,security_model=passthrough \ |
| -device virtio-9p-pci,fsdev=fsdev0,mount_tag=host,bus=pcie.2 |
| |
| Multi sockets |
| ~~~~~~~~~~~~~ |
| |
| The number of sockets is deduced from the number of CPUs and the |
| number of cores. ``-smp 2,cores=1`` will define a machine with 2 |
| sockets of 1 core, whereas ``-smp 2,cores=2`` will define a machine |
| with 1 socket of 2 cores. ``-smp 8,cores=2``, 4 sockets of 2 cores. |
| |
| BMC configuration |
| ~~~~~~~~~~~~~~~~~ |
| |
| OpenPOWER systems negotiate the shutdown and reboot with their |
| BMC. The QEMU PowerNV machine embeds an IPMI BMC simulator using the |
| iBT interface and should offer the same power features. |
| |
| If you want to define your own BMC, use ``-nodefaults`` and specify |
| one on the command line : |
| |
| .. code-block:: bash |
| |
| -device ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10 |
| |
| The files `palmetto-SDR.bin <http://www.kaod.org/qemu/powernv/palmetto-SDR.bin>`__ |
| and `palmetto-FRU.bin <http://www.kaod.org/qemu/powernv/palmetto-FRU.bin>`__ |
| define a Sensor Data Record repository and a Field Replaceable Unit |
| inventory for a palmetto BMC. They can be used to extend the QEMU BMC |
| simulator. |
| |
| .. code-block:: bash |
| |
| -device ipmi-bmc-sim,sdrfile=./palmetto-SDR.bin,fruareasize=256,frudatafile=./palmetto-FRU.bin,id=bmc0 \ |
| -device isa-ipmi-bt,bmc=bmc0,irq=10 |
| |
| The PowerNV machine can also be run with an external IPMI BMC device |
| connected to a remote QEMU machine acting as BMC, using these options |
| : |
| |
| .. code-block:: bash |
| |
| -chardev socket,id=ipmi0,host=localhost,port=9002,reconnect=10 \ |
| -device ipmi-bmc-extern,id=bmc0,chardev=ipmi0 \ |
| -device isa-ipmi-bt,bmc=bmc0,irq=10 \ |
| -nodefaults |
| |
| NVRAM |
| ~~~~~ |
| |
| Use a MTD drive to add a PNOR to the machine, and get a NVRAM : |
| |
| .. code-block:: bash |
| |
| -drive file=./witherspoon.pnor,format=raw,if=mtd |
| |
| CAVEATS |
| ------- |
| |
| * No support for multiple HW threads (SMT=1). Same as pseries. |
| * CPU can hang when doing intensive I/Os. Use ``-append powersave=off`` in that case. |