| .. _skiboot-5.9-rc1: |
| |
| skiboot-5.9-rc1 |
| =============== |
| |
| skiboot v5.9-rc1 was released on Wednesday October 11th 2017. It is the first |
| release candidate of skiboot 5.9, which will become the new stable release |
| of skiboot following the 5.8 release, first released August 31st 2017. |
| |
| skiboot v5.9-rc1 contains all bug fixes as of :ref:`skiboot-5.4.7` |
| and :ref:`skiboot-5.1.21` (the currently maintained stable releases). We |
| do not currently expect to do any 5.8.x stable releases. |
| |
| For how the skiboot stable releases work, see :ref:`stable-rules` for details. |
| |
| The current plan is to cut the final 5.9 by October 17th, with skiboot 5.9 |
| being for all POWER8 and POWER9 platforms in op-build v1.20 (Due October 18th). |
| This release will be targetted to early POWER9 systems. |
| |
| Over skiboot-5.8, we have the following changes: |
| |
| New Features |
| ------------ |
| |
| POWER8 |
| ^^^^^^ |
| - fast-reset by default (if possible) |
| |
| Currently, this is limited to POWER8 systems. |
| |
| A normal reboot will, rather than doing a full IPL, go through a |
| fast reboot procedure. This reduces the "reboot to petitboot" time |
| from minutes to a handful of seconds. |
| |
| POWER9 |
| ^^^^^^ |
| - POWER9 power management during boot |
| |
| Less power should be consumed during boot. |
| - OPAL_SIGNAL_SYSTEM_RESET for POWER9 |
| |
| This implements OPAL_SIGNAL_SYSTEM_RESET, using scom registers to |
| quiesce the target thread and raise a system reset exception on it. |
| It has been tested on DD2 with stop0 ESL=0 and ESL=1 shallow power |
| saving modes. |
| |
| DD1 is not implemented because it is sufficiently different as to |
| make support difficult. |
| - Enable deep idle states for POWER9 |
| |
| - SLW: Add support for p9_stop_api |
| |
| p9_stop_api's are used to set SPR state on a core wakeup form a deeper |
| low power state. p9_stop_api uses low level platform formware and |
| self-restore microcode to restore the sprs to requested values. |
| |
| Code is taken from : |
| https://github.com/open-power/hostboot/tree/master/src/import/chips/p9/procedures/utils/stopreg |
| - SLW: Removing timebase related flags for stop4 |
| |
| When a core enters stop4, it does not loose decrementer and time base. |
| Hence removing flags OPAL_PM_DEC_STOP and OPAL_PM_TIMEBASE_STOP. |
| - SLW: Allow deep states if homer address is known |
| |
| Use a common variable has_wakeup_engine instead of has_slw to tell if |
| the: |
| - SLW image is populated in case of power8 |
| - CME image is populated in case of power9 |
| |
| Currently we expect CME to be loaded if homer address is known ( except |
| for simulators) |
| - SLW: Configure self-restore for HRMOR |
| |
| Make a stop api call using libpore to restore HRMOR register. HRMOR needs |
| to be cleared so that when thread exits stop, they arrives at linux |
| system_reset vector (0x100). |
| - SLW: Add opal_slw_set_reg support for power9 |
| |
| This OPAL call is made from Linux to OPAL to configure values in |
| various SPRs after wakeup from a deep idle state. |
| - PHB4: CAPP recovery |
| |
| CAPP recovery is initiated when a CAPP Machine Check is detected. |
| The capp recovery procedure is initiated via a Hypervisor Maintenance |
| interrupt (HMI). |
| |
| CAPP Machine Check may arise from either an error that results in a PHB |
| freeze or from an internal CAPP error with CAPP checkstop FIR action. |
| An error that causes a PHB freeze will result in the link down signal |
| being asserted. The system continues running and the CAPP and PSL will |
| be re-initialized. |
| |
| This implements CAPP recovery for POWER9 systems |
| - Add ``wafer-location`` property for POWER9 |
| |
| Extract wafer-location from ECID and add property under xscom node. |
| - bits 64:71 are the chip x location (7:0) |
| - bits 72:79 are the chip y location (7:0) |
| |
| Sample output: :: |
| |
| [root@wsp xscom@623fc00000000]# lsprop ecid |
| ecid 019a00d4 03100718 852c0000 00fd7911 |
| [root@wsp xscom@623fc00000000]# lsprop wafer-location |
| wafer-location 00000085 0000002c |
| - Add ``wafer-id`` property for POWER9 |
| |
| Wafer id is derived from ECID data. |
| - bits 4:63 are the wafer id ( ten 6 bit fields each containing a code) |
| |
| Sample output: :: |
| |
| [root@wsp xscom@623fc00000000]# lsprop ecid |
| ecid 019a00d4 03100718 852c0000 00fd7911 |
| [root@wsp xscom@623fc00000000]# lsprop wafer-id |
| wafer-id "6Q0DG340SO" |
| - Add ``ecid`` property under ``xscom`` node for POWER9. |
| Sample output: :: |
| |
| [root@wsp xscom@623fc00000000]# lsprop ecid |
| ecid 019a00d4 03100718 852c0000 00fd7911 |
| - Add ibm,firmware-versions device tree node |
| |
| In P8, hostboot provides mini device tree. It contains ``/ibm,firmware-versions`` |
| node which has various firmware component version details. |
| |
| In P9, OPAL is building device tree. This patch adds support to parse VERSION |
| section of PNOR and create ``/ibm,firmware-versions`` device tree node. |
| |
| Sample output: :: |
| |
| /sys/firmware/devicetree/base/ibm,firmware-versions # lsprop . |
| occ "6a00709" |
| skiboot "v5.7-rc1-p344fb62" |
| buildroot "2017.02.2-7-g23118ce" |
| capp-ucode "9c73e9f" |
| petitboot "v1.4.3-p98b6d83" |
| sbe "02021c6" |
| open-power "witherspoon-v1.17-128-gf1b53c7-dirty" |
| .... |
| .... |
| |
| POWER9 |
| ------ |
| |
| - Disable Transactional Memory on Power9 DD 2.1 |
| |
| Update pa_features_p9[] to disable TM (Transactional Memory). On DD 2.1 |
| TM is not usable by Linux without other workarounds, so skiboot must |
| disable it. |
| - xscom: Do not print error message for 'chiplet offline' return values |
| |
| xscom_read/write operations returns CHIPLET_OFFLINE when chiplet is offline. |
| Some multicast xscom_read/write requests from HBRT results in xscom operation |
| on offline chiplet(s) and printing below warnings in OPAL console: :: |
| |
| [ 135.036327572,3] XSCOM: Read failed, ret = -14 |
| [ 135.092689829,3] XSCOM: Read failed, ret = -14 |
| |
| Some SCOM users can deal correctly with this error code (notably opal-prd), |
| so the error message is (in practice) erroneous. |
| - IMC: Fix the core_imc_event_mask |
| |
| CORE_IMC_EVENT_MASK is a scom that contains bits to control event sampling for |
| different machine state for core imc. The current event-mask setting sample |
| events only on host kernel (hypervisor) and host userspace. |
| |
| Patch to enable the sampling of events in other machine states (like guest |
| kernel and guest userspace). |
| - IMC: Update the nest_pmus array with occ/gpe microcode uav updates |
| |
| OOC/gpe nest microcode maintains the list of individual nest units |
| supported. Sync the recent updates to the UAV with nest_pmus array. |
| |
| For reference occ/gpr microcode link for the UAV: |
| https://github.com/open-power/occ/blob/master/src/occ_gpe1/gpe1_24x7.h |
| - Parse IOSLOT information from HDAT |
| |
| Add structure definitions that describe the physical PCIe topology of |
| a system and parse them into the device-tree based PCIe slot |
| description. |
| - idle: user context state loss flags fix for stop states |
| |
| The "lite" stop variants with PSSCR[ESL]=PSSCR[EC]=1 do not lose user |
| context, while the non-lite variants do (ESL: enable state loss). |
| |
| Some of the POWER9 idle states had these wrong. |
| |
| CAPI |
| ^^^^ |
| - POWER9 DD2 update |
| |
| The CAPI initialization sequence has been updated in DD2. |
| This patch adapts to the changes, retaining compatibility with DD1. |
| The patch includes some changes to DD1 fix-ups as well. |
| - Load CAPP microcode for POWER9 DD2.0 and DD2.1 |
| - capi: Mask Psl Credit timeout error for POWER9 |
| |
| Mask the PSL credit timeout error in CAPP FIR Mask register |
| bit(46). As per the h/w team this error is now deprecated and shouldn't |
| cause any fir-action for P9. |
| |
| NVLINK2 |
| ^^^^^^^ |
| |
| A notabale change is that we now generate the device tree description of |
| NVLINK based on the HDAT we get from hostboot. Since Hostboot will generate |
| HDAT based on VPD, you now *MUST* have correct VPD programmed or we will |
| *default* to a Sequoia layout, which will lead to random problems if you |
| are not booting a Sequoia Witherspoon planar. In the case of booting with |
| old VPD and/or Hostboot, we print a **giant scary warning** in order to scare you. |
| |
| - npu2: Read slot label from the HDAT link node |
| |
| Binding GPU to emulated NPU PCI devices is done using the slot labels |
| since the NPU devices do not have a patching slot node we need to |
| copy the label in here. |
| |
| - npu2: Copy link speed from the npu HDAT node |
| |
| This needs to be in the PCI device node so the speed of the NVLink |
| can be passed to the GPU driver. |
| - npu2: hw-procedures: Add settings to PHY_RESET |
| |
| Set a few new values in the PHY_RESET procedure, as specified by our |
| updated programming guide documentation. |
| - Parse NVLink information from HDAT |
| |
| Add the per-chip structures that descibe how the A-Bus/NVLink/OpenCAPI |
| phy is configured. This generates the npu@xyz nodes for each chip on |
| systems that support it. |
| - npu2: Add vendor cap for IRQ testing |
| |
| Provide a way to test recoverable data link interrupts via a new |
| vendor capability byte. |
| - npu2: Enable recoverable data link (no-stall) interrupts |
| |
| Allow the NPU2 to trigger "recoverable data link" interrupts. |
| |
| - npu2: Implement basic FLR (Function Level Reset) |
| - npu2: hw-procedures: Update PHY DC calibration procedure |
| - npu2: hw-procedures: Change rx_pr_phase_step value |
| |
| XIVE |
| ^^^^ |
| - xive: Fix opal_xive_dump_tm() to access W2 properly. |
| The HW only supported limited access sizes. |
| - xive: Make opal_xive_allocate_irq() properly try all chips |
| |
| When requested via OPAL_XIVE_ANY_CHIP, we need to try all |
| chips. We first try the current one (on which the caller |
| sits) and if that fails, we iterate all chips until the |
| allocation succeeds. |
| - xive: Fix initialization & cleanup of HW thread contexts |
| |
| Instead of trying to "pull" everything and clear VT (which didn't |
| work and caused some FIRs to be set), instead just clear and then |
| set the PTER thread enable bit. This has the side effect of |
| completely resetting the corresponding thread context. |
| |
| This fixes the spurrious XIVE FIRs reported by PRD and fircheck |
| - xive: Add debug option for detecting misrouted IPI in emulation |
| |
| This is high overhead so we don't enable it by default even |
| in debug builds, it's also a bit messy, but it allowed me to |
| detect and debug a locking issue earlier so it can be useful. |
| - xive: Increase the interrupt "gap" on debug builds |
| |
| We normally allocate IPIs from 0x10. Make that 0x1000 on debug |
| builds to limit the chances of overlapping with Linux interrupt |
| numbers which makes debugging code that confuses them easier. |
| |
| Also add a warning in emulation if we get an interrupt in the |
| queue whose number is below the gap. |
| - xive: Fix locking around cache scrub & watch |
| |
| Thankfully the missing locking only affects debug code and |
| init code that doesn't run concurrently. Also adds a DEBUG |
| option that checks the lock is properly held. |
| - xive: Workaround HW issue with scrub facility |
| |
| Without this, we sometimes don't observe from a CPU the |
| values written to the ENDs or NVTs via the cache watch. |
| - xive: Add exerciser for cache watch/scrub facility in DEBUG builds |
| - xive: Make assertion in xive_eq_for_target() more informative |
| - xive: Add debug code to check initial cache updates |
| - xive: Ensure pressure relief interrupts are disabled |
| |
| We don't use them and we hijack the VP field with their |
| configuration to store the EQ reference, so make sure the |
| kernel or guest can't turn them back on by doing MMIO |
| writes to ACK# |
| - xive: Don't try setting the reserved ACK# field in VPs |
| |
| That doesn't work, the HW doesn't implement it in the cache |
| watch facility anyway. |
| - xive: Remove useless memory barriers in VP/EQ inits |
| |
| We no longer update "live" memory structures, we use a temporary |
| copy on the stack and update the actual memory structure using |
| the cache watch, so those barriers are pointless. |
| |
| PHB4 |
| ^^^^ |
| - phb4: Mask RXE_ARB: DEC Stage Valid Error |
| |
| Change the inits to mask out the RXE ARB: DEC Stage Valid Error (bit |
| 370. This has been a fatal error but should be informational only. |
| |
| This update will be in the next version of the phb4 workbook. |
| - phb4: Add additional adapter to retrain whitelist |
| |
| The single port version of the ConnectX-5 has a different device ID 0x1017. |
| Updated descriptions to match pciutils database. |
| - PHB4: Default to PCIe GEN3 on POWER9 DD2.00 |
| |
| You can use the NVRAM override for DD2.00 screened parts. |
| - phb4: Retrain link if degraded |
| |
| On P9 Scale Out (Nimbus) DD2.0 and Scale in (Cumulus) DD1.0 (and |
| below) the PCIe PHY can lockup causing training issues. This can cause |
| a degradation in speed or width in ~5% of training cases (depending on |
| the card). This is fixed in later chip revisions. This issue can also |
| cause PCIe links to not train at all, but this case is already |
| handled. |
| |
| This patch checks if the PCIe link has trained optimally and if not, |
| does a full PHB reset (to fix the PHY lockup) and retrain. |
| |
| One complication is some devices are known to train degraded unless |
| device specific configuration is performed. Because of this, we only |
| retrain when the device is in a whitelist. All devices in the current |
| whitelist have been testing on a P9DSU/Boston, ZZ and Witherspoon. |
| |
| We always gather information on the link and print it in the logs even |
| if the card is not in the whitelist. |
| |
| For testing purposes, there's an nvram to retry all PCIe cards and all |
| P9 chips when a degraded link is detected. The new option is |
| 'pci-retry-all=true' which can be set using: |
| `nvram -p ibm,skiboot --update-config pci-retry-all=true`. |
| This option may increase the boot time if used on a badly behaving |
| card. |
| |
| |
| IBM FSP platforms |
| ----------------- |
| |
| - FSP/NVRAM: Handle "get vNVRAM statistics" command |
| |
| FSP sends MBOX command (cmd : 0xEB, subcmd : 0x05, mod : 0x00) to get vNVRAM |
| statistics. OPAL doesn't maintain any such statistics. Hence return |
| FSP_STATUS_INVALID_SUBCMD. |
| |
| Fixes these messages appearing in the OPAL log: :: |
| |
| [16944.384670488,3] FSP: Unhandled message eb0500 |
| [16944.474110465,3] FSP: Unhandled message eb0500 |
| [16945.111280784,3] FSP: Unhandled message eb0500 |
| [16945.293393485,3] FSP: Unhandled message eb0500 |
| - fsp: Move common prints to trace |
| |
| These two prints just end up filling the skiboot logs on any machine |
| that's been booted for more than a few hours. |
| |
| They have never been useful, so make them trace level. They were: :: |
| SURV: Received heartbeat acknowledge from FSP |
| SURV: Sending the heartbeat command to FSP |
| |
| BMC based systems |
| ----------------- |
| - hw/lpc-uart: read from RBR to clear character timeout interrupts |
| |
| When using the aspeed SUART, we see a condition where the UART sends |
| continuous character timeout interrupts. This change adds a (heavily |
| commented) dummy read from the RBR to clear the interrupt condition on |
| init. |
| |
| This was observed on p9dsu systems, but likely applies to other systems |
| using the SUART. |
| - astbmc: Add methods for handing Device Tree based slots |
| e.g. ones from HDAT on POWER9. |
| |
| General |
| ------- |
| - ipmi: Convert common debug prints to trace |
| |
| OPAL logs messages for every IPMI request from host. Sometime OPAL console |
| is filled with only these messages. This path is pretty stable now and |
| we have enough logs to cover bad path. Hence lets convert these debug |
| message to trace/info message. Examples are: :: |
| |
| [ 1356.423958816,7] opal_ipmi_recv(cmd: 0xf0 netfn: 0x3b resp_size: 0x02) |
| [ 1356.430774496,7] opal_ipmi_send(cmd: 0xf0 netfn: 0x3a len: 0x3b) |
| [ 1356.430797392,7] BT: seq 0x20 netfn 0x3a cmd 0xf0: Message sent to host |
| [ 1356.431668496,7] BT: seq 0x20 netfn 0x3a cmd 0xf0: IPMI MSG done |
| - libflash/file: Handle short read()s and write()s correctly |
| |
| Currently we don't move the buffer along for a short read() or write() |
| and nor do we request only the remaining amount. |
| |
| - hw/p8-i2c: Rework timeout handling |
| |
| Currently we treat a timeout as a hard failure and will automatically |
| fail any transations that hit their timeout. This results in |
| unnecessarily failing I2C requests if interrupts are dropped, etc. |
| Although these are bad things that we should log we can handle them |
| better by checking the actual hardware status and completing the |
| transation if there are no real errors. This patch reworks the timeout |
| handling to check the status and continue the transaction if it can. |
| if it can while logging an error if it detects a timeout due to a |
| dropped interrupt. |
| - core/flash: Only expect ELF header for BOOTKERNEL partition flash resource |
| |
| When loading a flash resource which isn't signed (secure and trusted |
| boot) and which doesn't have a subpartition, we assume it's the |
| BOOTKERNEL since previously this was the only such resource. Thus we |
| also assumed it had an ELF header which we parsed to get the size of the |
| partition rather than trusting the actual_size field in the FFS header. |
| A previous commit (9727fe3 DT: Add ibm,firmware-versions node) added the |
| version resource which isn't signed and also doesn't have a subpartition, |
| thus we expect it to have an ELF header. It doesn't so we print the |
| error message "FLASH: Invalid ELF header part VERSION". |
| |
| It is a fluke that this works currently since we load the secure boot |
| header unconditionally and this happen to be the same size as the |
| version partition. We also don't update the return code on error so |
| happen to return OPAL_SUCCESS. |
| |
| To make this explicitly correct; only check for an ELF header if we are |
| loading the BOOTKERNEL resource, otherwise use the partition size from |
| the FFS header. Also set the return code on error so we don't |
| erroneously return OPAL_SUCCESS. Add a check that the resource will fit |
| in the supplied buffer to prevent buffer overrun. |
| - flash: Support adding the no-erase property to flash |
| |
| The mbox protocol explicitly states that an erase is not required |
| before a write. This means that issuing an erase from userspace, |
| through the mtd device, and back returns a successful operation |
| that does nothing. Unfortunately, this makes userspace tools unhappy. |
| Linux MTD devices support the MTD_NO_ERASE flag which conveys that |
| writes do not require erases on the underlying flash devices. We |
| should set this property on all of our |
| devices which do not require erases to be performed. |
| |
| NOTE: This still requires a linux kernel component to set the |
| MTD_NO_ERASE flag from the device tree property. |
| |
| Utilities |
| --------- |
| - external/gard: Clear entire guard partition instead of entry by entry |
| |
| When using the current implementation of the gard tool to ecc clear the |
| entire GUARD partition it is done one gard record at a time. While this |
| may be ok when accessing the actual flash this is very slow when done |
| from the host over the mbox protocol (on the order of 4 minutes) because |
| the bmc side is required to do many read, erase, writes under the hood. |
| |
| Fix this by rewriting the gard tool reset_partition() function. Now we |
| allocate all the erased guard entries and (if required) apply ecc to the |
| entire buffer. Then we can do one big erase and write of the entire |
| partition. This reduces the time to clear the guard partition to on the |
| order of 4 seconds. |
| - opal-prd: Fix opal-prd command line options |
| |
| HBRT OCC reset interface depends on service processor type. |
| |
| - FSP: reset_pm_complex() |
| - BMC: process_occ_reset() |
| |
| We have both `occ` and `pm-complex` command line interfaces. |
| This patch adds support to dispaly appropriate message depending |
| on system type. |
| |
| === ==================== ============================ |
| SP Command Action |
| === ==================== ============================ |
| FSP opal-prd occ display error message |
| FSP opal-prd pm-complex Call pm_complex_reset() |
| BMC opal-prd occ Call process_occ_reset() |
| BMC opal-prd pm-complex display error message |
| === ==================== ============================ |
| |
| - opal-prd: detect service processor type and |
| then make appropriate occ reset call. |
| - pflash: Fix erase command for unaligned start address |
| |
| The erase_range() function handles erasing the flash for a given start |
| address and length, and can handle an unaligned start address and |
| length. However in the unaligned start address case we are incorrectly |
| calculating the remaining size which can lead to incomplete erases. |
| |
| If we're going to update the remaining size based on what the start |
| address was then we probably want to do that before we overide the |
| origin start address. So rearrange the code so that this is indeed the |
| case. |
| - external/gard: Print an error if run on an FSP system |
| |
| Simulators |
| ---------- |
| |
| - mambo: Add mambo socket program |
| |
| This adds a program that can be run inside a mambo simulator in linux |
| userspace which enables TCP sockets to be proxied in and out of the |
| simulator to the host. |
| |
| Unlike mambo bogusnet, it's requires no linux or skiboot specific |
| drivers/infrastructure to run. |
| |
| Run inside the simulator: |
| |
| - to forward host ssh connections to sim ssh server: |
| ``./mambo-socket-proxy -h 10022 -s 22``, then connect to port 10022 |
| on your host with ``ssh -p 10022 localhost`` |
| - to allow http proxy access from inside the sim to local http proxy: |
| ``./mambo-socket-proxy -b proxy.mynetwork -h 3128 -s 3128`` |
| |
| Multiple connections are supported. |
| - idle: disable stop*_lite POWER9 idle states for Mambo platform |
| |
| Mambo prior to Mambo.7.8.21 had a bug where the stop idle instruction |
| with PSSCR[ESL]=PSSCR[EC]=0 would resume with MSR set as though it had |
| taken a system reset interrupt. |
| |
| Linux currently executes this instruction with MSR already set that |
| way, so the problem went unnoticed. A proposed patch to Linux changes |
| that, and causes the idle code to crash. Work around this by disabling |
| lite stop states for the mambo platform for now. |