| .. _skiboot-5.8-rc1: |
| |
| skiboot-5.8-rc1 |
| =============== |
| |
| skiboot v5.8-rc1 was released on Tuesday August 22nd 2017. It is the first |
| release candidate of skiboot 5.8, which will become the new stable release |
| of skiboot following the 5.7 release, first released 25th July 2017. |
| |
| skiboot v5.8-rc1 contains all bug fixes as of :ref:`skiboot-5.4.6` |
| and :ref:`skiboot-5.1.20` (the currently maintained stable releases). We |
| do not currently expect to do any 5.7.x stable releases. |
| |
| For how the skiboot stable releases work, see :ref:`stable-rules` for details. |
| |
| The current plan is to cut the final 5.8 by August 25th, with skiboot 5.8 |
| being for all POWER8 and POWER9 platforms in op-build v1.19 (Due August 25th). |
| This is a short cycle as this release is mainly targetted towards POWER9 |
| bringup efforts. |
| |
| Over skiboot-5.7, we have the following changes: |
| |
| New Features |
| ------------ |
| - sensors: occ: Add support to clear sensor groups |
| |
| Adds a generic API to clear sensor groups. OCC inband sensor groups |
| such as CSM, Profiler and Job Scheduler can be cleared using this API. |
| It will clear the min/max of all sensors belonging to OCC sensor |
| groups. |
| |
| - sensors: occ: Add CSM_{min/max} sensors |
| |
| HWMON's lowest/highest attribute is used by CSM agent, so map min/max |
| device-tree properties "sensor-data-min" and "sensor-data-max" to |
| the min/max of CSM. |
| |
| - sensors: occ: Add support for OCC inband sensors |
| |
| Add support to parse and export OCC inband sensors which are copied |
| by OCC to main memory in P9. Each OCC writes three buffers which |
| includes one names buffer for sensor meta data and two buffers for |
| sensor readings. While OCC writes to one buffer the sensor values |
| can be read from the other buffer. The sensors are updated every |
| 100ms. |
| |
| This patch adds power, temperature, current and voltage sensors to |
| ``/ibm,opal/sensors`` device-tree node which can be exported by the |
| ibmpowernv-hwmon driver in Linux. |
| |
| - psr: occ: Add support to change power-shifting-ratio |
| |
| Add support to set the CPU-GPU power shifting ratio which is used by |
| the OCC power capping algorithm. PSR value of 100 takes all power away |
| from CPU first and a PSR value of 0 caps GPU first. |
| |
| - powercap: occ: Add a generic powercap framework |
| |
| This patch adds a generic powercap framework and exports OCC powercap |
| sensors using which system powercap can be set inband through OPAL-OCC |
| command-response interface. |
| - phb4: Enable PCI peer-to-peer |
| |
| P9 supports PCI peer-to-peer: a PCI device can write directly to the |
| mmio space of another PCI device. It completely by-passes the CPU. |
| |
| It requires some configuration on the PHBs involved: |
| |
| 1. on the initiating side, the address for the read/write operation is |
| in the mmio space of the target, i.e. well outside the range normally |
| allowed. So we disable range-checking on the TVT entry in bypass mode. |
| |
| 2. on the target side, we need to explicitly enable p2p by setting a |
| bit in a configuration register. It has the side-effect of reserving |
| an outbound (as seen from the CPU) store queue for p2p. Therefore we |
| only enable p2p on the PHBs using it, as we don't want to waste the |
| resource if we don't have to. |
| |
| P9 supports p2p mmio writes. Reads are currently only supported if the |
| two devices are under the same PHB but that is expected to change in |
| the future, and it raises questions about intermediate switches |
| configuration, so we report an error for the time being. |
| |
| The patch adds a new OPAL call to allow the OS to declare a p2p |
| (initiator, target) pair. |
| |
| - NX 842 and GZIP support on POWER9 |
| |
| |
| POWER9 DD2 |
| ---------- |
| |
| Further support for POWER9 DD2 revision chips. Notable changes include: |
| |
| - xscom: Grab P9 DD2 revision level |
| - vas: Set mmio enable bits in DD2 |
| |
| POWER9 DD2 added some new "enable" bits that must be set for VAS to |
| work. These bits were unused in DD1. |
| - hdat: Add POWER9 DD2.0 specific pa_features |
| |
| Same as the default but with TM off. |
| |
| POWER9 |
| ------ |
| - Base NPU2 support on POWER9 DD2 |
| - hdata/i2c: Work around broken I2C array version |
| |
| Work around a bug in the I2C devices array that shows the |
| array version as being v2 when only the v1 data is populated. |
| - Recognize the 2s2u zz platform |
| |
| OPAL currently doesn't know about the 2s2u zz. It recognizes such a |
| box as a generic BMC machine and fails to boot. Add the 2s2u as a |
| supported platform. |
| |
| There will subsequently be a 2s2u-L system which may have a different |
| compatible property, which will need to be handled later. |
| - hdata/spira: POWER9 NX isn't software compatible with P7/P8 NX, don't claim so |
| - NX: Add P9 NX support for gzip compression engine |
| |
| Power 9 introduces NX gzip compression engine. This patch adds gzip |
| compression support in NX. Virtual Accelerator Switch (VAS) is used to |
| access NX gzip engine and the channel configuration will be done with |
| the receive FIFO. So RxFIFO address, logical partition ID (lpid), |
| process ID (pid) and thread ID (tid) are used to configure RxFIFO. |
| P9 NX supports high and normal priority FIFOS. Skiboot configures User |
| Mode Access Control (UMAC) noitify match register with these values and |
| also enables other registers to enable / disable the engine. |
| |
| Creates the following device-tree entries to provide RxFIFO address, |
| RxFIFO size, Fifo priority, lpid, pid and tid values so that kernel |
| can drive P9 NX gzip engine. |
| |
| The following nodes are located under an xscom node: :: |
| /xscom@<xscom_addr>/nx@<nx_addr> |
| |
| /ibm,gzip-high-fifo : High priority gzip RxFIFO |
| /ibm,gzip-normal-fifo : Normal priority gzip RxFIFO |
| |
| Each RxFIFO node contain:s |
| |
| ``compatible`` |
| ``ibm,p9-nx-gzip`` |
| ``priority`` |
| High or Normal |
| ``rx-fifo-address`` |
| RxFIFO address |
| ``rx-fifo-size`` |
| RxFIFO size |
| ``lpid`` |
| 0xfff (1's for 12 bits in UMAC notify match register) |
| ``pid`` |
| gzip coprocessor type |
| ``tid`` |
| counter for gzip |
| |
| - NX: Add P9 NX support for 842 compression engine |
| |
| This patch adds changes needed for 842 compression engine on power 9. |
| Virtual Accelerator Switch (VAS) is used to access NX 842 engine on P9 |
| and the channel setup will be done with receive FIFO. So RxFIFO |
| address, logical partition ID (lpid), process ID (pid) and thread ID |
| (tid) are used for this setup. p9 NX supports high and normal priority |
| FIFOs. skiboot is not involved to process data with 842 engine, but |
| configures User Mode Access Control (UMAC) noitify match register with |
| these values and export them to kernel with device-tree entries. |
| |
| Also configure registers to setup and enable / disable the engine with |
| the appropriate registers. Creates the following device-tree entries to |
| provide RxFIFO address, RxFIFO size, Fifo priority, lpid, pid and tid |
| values so that kernel can drive P9 NX 842 engine. |
| |
| The following nodes are located under an xscom node: |
| ``/xscom@<xscom_addr>/nx@<nx_addr>`` |
| |
| ``/ibm,842-high-fifo`` |
| High priority 842 RxFIFO |
| ``/ibm,842-normal-fifo`` |
| Normal priority 842 RxFIFO |
| |
| Each RxFIFO node contains: |
| |
| ``compatible`` |
| ibm,p9-nx-842 |
| ``priority`` |
| High or Normal |
| ``rx-fifo-address`` |
| RxFIFO address |
| ``rx-fifo-size`` |
| RXFIFO size |
| ``lpid`` |
| 0xfff (1's for 12 bits set in UMAC notify match register) |
| ``pid`` |
| 842 coprocessor type |
| ``tid`` |
| Counter for 842 |
| - vas: Create MMIO device tree node |
| |
| Create a device tree node for VAS and add properties that Linux |
| will need to configure/use VAS. |
| - opal: Extract sw checkstop fir address from HDAT. |
| |
| Extract sw checkstop fir address info from HDAT and populate device tree |
| node ibm,sw-checkstop-fir. |
| |
| This patch is required for OPAL_CEC_REBOOT2 OPAL call to work as expected |
| on p9. |
| |
| With this patch a device property 'ibm,sw-checkstop-fir' is now properly |
| populated: :: |
| |
| # lsprop ibm,sw-checkstop-fir |
| ibm,sw-checkstop-fir |
| 05012000 0000001f |
| |
| PHB4 |
| ---- |
| - hdat: Fix PCIe GEN4 lane-eq setting for DD2 |
| |
| For PCIe GEN4, DD2 uses only 1 byte per PCIe lane for the lane-eq |
| settings (DD1 uses 2 bytes) |
| - pci: Wait for CRS and switch link when restoring bus numbers |
| |
| When a complete reset occurs, after the PHB recovers it propagates a |
| reset down the wire to every device. At the same time, skiboot talks to |
| every device in order to restore the state of devices to what they were |
| before the reset. |
| |
| In some situations, such as devices that recovered slowly and/or were |
| behind a switch, skiboot attempted to access config space of the device |
| before the link was up and the device could respond. |
| |
| Fix this by retrying CRS until the device responds correctly, and for |
| devices behind a switch, making sure the switch has its link up first. |
| - pci: Track whether a PCI device is a virtual function |
| |
| This can be checked from config space, but we will need to know this when |
| restoring the PCI topology, and it is not always safe to access config |
| space during this period. |
| - phb4: Enhanced PCIe training tracing |
| |
| This add more details to the PCI training tracing (aka Rick Mata |
| mode). It enables the PCIe Link Training and Status State |
| Machine (LTSSM) tracing and details on speed and link width. |
| |
| Output now looks like this when enabled (via nvram): :: |
| |
| [ 1.096995141,3] PHB#0000[0:0]: TRACE:0x0000001101000000 0ms GEN1:x16:detect |
| [ 1.102849137,3] PHB#0000[0:0]: TRACE:0x0000102101000000 11ms presence GEN1:x16:polling |
| [ 1.104341838,3] PHB#0000[0:0]: TRACE:0x0000182101000000 14ms training GEN1:x16:polling |
| [ 1.104357444,3] PHB#0000[0:0]: TRACE:0x00001c5101000000 14ms training GEN1:x16:recovery |
| [ 1.104580394,3] PHB#0000[0:0]: TRACE:0x00001c5103000000 14ms training GEN3:x16:recovery |
| [ 1.123259359,3] PHB#0000[0:0]: TRACE:0x00001c5104000000 51ms training GEN4:x16:recovery |
| [ 1.141737656,3] PHB#0000[0:0]: TRACE:0x0000144104000000 87ms presence GEN4:x16:L0 |
| [ 1.141752318,3] PHB#0000[0:0]: TRACE:0x0000154904000000 87ms trained GEN4:x16:L0 |
| [ 1.141757964,3] PHB#0000[0:0]: TRACE: Link trained. |
| [ 1.096834019,3] PHB#0001[0:1]: TRACE:0x0000001101000000 0ms GEN1:x16:detect |
| [ 1.105578525,3] PHB#0001[0:1]: TRACE:0x0000102101000000 17ms presence GEN1:x16:polling |
| [ 1.112763075,3] PHB#0001[0:1]: TRACE:0x0000183101000000 31ms training GEN1:x16:config |
| [ 1.112778956,3] PHB#0001[0:1]: TRACE:0x00001c5081000000 31ms training GEN1:x08:recovery |
| [ 1.113002083,3] PHB#0001[0:1]: TRACE:0x00001c5083000000 31ms training GEN3:x08:recovery |
| [ 1.114833873,3] PHB#0001[0:1]: TRACE:0x0000144083000000 35ms presence GEN3:x08:L0 |
| [ 1.114848832,3] PHB#0001[0:1]: TRACE:0x0000154883000000 35ms trained GEN3:x08:L0 |
| [ 1.114854650,3] PHB#0001[0:1]: TRACE: Link trained. |
| |
| - phb4: Fix reading wrong size registers in EEH dump |
| |
| These registers are supposed to be 16bit, and it makes part of the |
| register dump misleading. |
| - phb4: Ignore slot state if performing complete reset |
| |
| If a PHB is being completely reset, its state is about to be blown away |
| anyway, so if it's not in an appropriate state, creset it regardless. |
| - phb4: Prepare for link down when creset called from kernel |
| |
| phb4_creset() is typically called by functions that prepare the link |
| to go down. In cases where creset() is called directly by the kernel, |
| this isn't the case and it can cause issues. Prepare for link down in |
| creset, just like we do in freset and hreset. |
| - phb4: Skip attempting to fix PHBs broken on boot |
| |
| If a PHB is marked broken it didn't work on boot, and if it didn't work |
| on boot then there's no point trying to recover it later |
| - phb4: Fix duplicate in EEH register dump |
| - phb4: Be more conservative on link presence timeout |
| |
| In this patch we tuned our link timing to be more agressive: |
| ``cf960e2884 phb4: Improve reset and link training timing`` |
| |
| Cards should take only 32ms but unfortunately we've seen some take |
| up to 440ms. Hence bump our timer up to 1000ms. |
| |
| This can hurt boot times on systems where slots indicate a hotplug |
| status but no electrical link is present (which we've seen). Since we |
| have to wait 1 second between PERST and touching config space anyway, |
| it shouldn't hurt too much. |
| - phb4: Assert PERST before PHB reset |
| |
| Currently we don't assert PERST before issuing a PHB reset. This means |
| any link issues while resetting the PHB will be logged as errors. |
| |
| This asserts PERST before we start resetting the PHB to avoid this. |
| - Revert "phb4: Read PERST signal rather than assuming it's asserted" |
| |
| This reverts commit b42ff2b904165addf32e77679cebb94a08086966 |
| |
| The original patch assumes that PERST has been asserted well before (> |
| 250ms) we hit here (ie. during hostboot). |
| |
| In a subesquent patch this will no longer be the case as we need to |
| assert PERST during PHB reset, which may only be a few milliseconds |
| before we hit this code. |
| |
| Hence revert this patch. Go back to the software mechanism using |
| skip_perst to determine if PERST should be asserted or not. This |
| allows us to keep the speed optimisation on boot. |
| - phb4: Set REGB error enables based on link state |
| |
| Currently we always set these enables when initing the PHB. If the |
| link is already down, we shouldn't set them as it may cause spurious |
| errors. |
| |
| This changes the code to only sets them if the link is up. |
| - phb4: Mark PHB as fenced on creset |
| |
| If we have to inject an error to trigger recover, we end up not |
| marking the PHB as fenced in the PHB struct. This fixes that. |
| - phb4: Clear errors before deasserting reset |
| |
| During reset we may have logged some errors (eg. due to the link going |
| down). |
| |
| Hence before we deassert PERST or Hot Reset, we need to clear these |
| errors. This ensures that once link training starts, only new errors |
| are logged. |
| - phb4: Disable device config space access when fenced |
| |
| On DD2 you can't access device config space when fenced, so just |
| disable access whenever we are fenced. |
| - phb4: Dump devctl and devstat registers |
| |
| Dump devctl and devstat registers. These would have been useful when |
| debugging the MPS issue. |
| - phb4: Only clear some PHB config space registers on errors |
| |
| Currently on error we clear the entire PHB config space. This is a |
| problem as the PCIe Maximum Payload Size (MPS) negotiation may have |
| already occurred. Clearing MPS in the PHB back to a default of 128 |
| bytes will result an error for a device which already has a larger MPS |
| configured. |
| |
| This will manifest itself as error due to a malformed TLP packet. ie. |
| ``phbPblErrorStatus bit 41 = "Malformed TLP error"`` |
| |
| This has been seen after kexec on with some adapters. |
| |
| This fixes the problem by only clearing a subset of registers on a phb |
| error. |
| |
| Utilities |
| --------- |
| - external/xscom-utils: Add ``--list-bits`` |
| |
| When using getscom/putscom it's helpful to know what bits are set in the |
| register. This patch adds an option to print out which bits are set |
| along with the value that was read/written to the register. Note that |
| this output indicates which bits are set using the IBM bit ordering |
| since that's what the XSCOM documentation uses. |
| |
| |
| opal-prd |
| -------- |
| |
| - opal-prd: Do not pass pnor file while starting daemon. |
| |
| This change to the included systemd init file means opal-prd can |
| start and run on IBM FSP based systems. |
| |
| We do not have pnor support on all the system. Also we have logic to |
| autodetect PNOR. Hence do not pass ``--pnor`` by default. |
| |
| - opal-prd: Disable pnor access interface on FSP system |
| |
| On FSP system host does not have access to PNOR. Hence disable PNOR |
| access interfaces. |
| |
| OPAL Sensors |
| ------------ |
| - sensor-groups : occ: Add 'ops' DT property |
| |
| Add new device-tree property 'ops' to define different operations |
| supported on each sensor-group. |
| |
| - OCC: Map OCC sensor to a chip-id |
| |
| Parse device tree to get chip-id for OCC sensor. |
| |
| - HDAT: Add chip-id property to ipmi sensors |
| |
| Presently we do not have a way to map sensor to chip id. Hence we are |
| always passing chip id 0 for occ_reset request (see occ_sensor_id_to_chip()). |
| |
| This patch adds chip-id property to sensors (whenever its available) so that |
| we can map occ sensor to chip-id and pass valid chip-id to occ_reset request. |
| |
| - xive: Check for valid PIR index when decoding |
| |
| This fixes an unlikely but possible assert() fail on kdump. |
| |
| - sensors: occ: Skip the deconfigured core sensors |
| |
| This patch skips the deconfigured cores from the core sensors while |
| parsing the sensor names in the main memory as these sensor values are |
| not updated by OCC. |
| |
| Tests |
| ----- |
| - hdata_to_dt: use a realistic PVR and chip revision |
| |
| - nx: PR_INFO that NX RNG and Crypto not yet supported on POWER9 |
| |
| - external/pflash: Add tests |
| - external/pflash: Reinstate the progress bars |
| |
| Recent work did some optimising which unfortunately removed some of the |
| progress bars in pflash. |
| |
| It turns out that there's only one thing people prefer to correctly |
| programmed flash chips, it is the ability to watch little equals |
| characters go across their screens for potentially minutes. |
| - external/pflash: Correct erase alignment checks |
| |
| pflash should check the alignment of addresses and sizes when asked to |
| erase. There are two possibilities: |
| |
| 1. The user has specified sizes manually in which case pflash should |
| be as flexible as possible, blocklevel_smart_erase() permits this. To |
| prevent possible mistakes pflash will require --force to perform a |
| manual erase of unaligned sizes. |
| 2. The user used -P to specify a partition, partitions aren't |
| necessarily erase granule aligned anymore, blocklevel_smart_erase() can |
| handle. In this it doesn't make sense to warn/error about misalignment |
| since the misalignment is inherent to the FFS partition and not really |
| user input. |
| |
| - external/pflash: Check the result of strtoul |
| |
| Also add 0x in front of --info output to avoid a copy and paste mistake. |
| |
| - libflash/file: Break up MTD erase ioctl() calls |
| |
| Unfortunately not all drivers are created equal and several drivers on |
| which pflash relies block in the kernel for quite some time and ignore |
| signals. |
| |
| This is really only a problem if pflash is to perform large erases. So |
| don't, perform these ops in small chunks. |
| |
| An in kernel fix is possible in most cases but it takes time and systems |
| will be running older drivers for quite some time. Since sector erases |
| aren't significantly slower than whole chip erases there isn't much of a |
| performance penalty to breaking up the erase ioctl()s. |
| |
| General |
| ------- |
| - opal-msg: Increase the max-async completion count by max chips possible |
| |
| - occ: Add support for OPAL-OCC command/response interface |
| |
| This patch adds support for a shared memory based command/response |
| interface between OCC and OPAL. In HOMER, there is an OPAL command |
| buffer and an OCC response buffer which is used to send inband |
| commands to OCC. |
| |
| - HDAT/device-tree: only add lid-type on pre-POWER9 systems |
| |
| Largely a relic of back when we had multiple entry points into OPAL depending |
| on which mechanism on an FSP we were using to get loaded, this isn't needed |
| on modern P9 as we only have one entry point (we don't do the PHYP LID hack). |