skiboot 5.8-rc1 release notes

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
diff --git a/doc/release-notes/skiboot-5.1.20.rst b/doc/release-notes/skiboot-5.1.20.rst
index 63105c3..b2d1fa6 100644
--- a/doc/release-notes/skiboot-5.1.20.rst
+++ b/doc/release-notes/skiboot-5.1.20.rst
@@ -1,3 +1,5 @@
+.. _skiboot-5.1.20:
+
 skiboot-5.1.20
 --------------
 
diff --git a/doc/release-notes/skiboot-5.8-rc1.rst b/doc/release-notes/skiboot-5.8-rc1.rst
new file mode 100644
index 0000000..3cb4596
--- /dev/null
+++ b/doc/release-notes/skiboot-5.8-rc1.rst
@@ -0,0 +1,480 @@
+.. _skiboot-5.8-rc1:
+
+skiboot-5.8-rc1
+===============
+
+skiboot v5.8-rc1 was released on Monday August 21st 2017. It is the first
+release candidate of skiboot 5.8, which will become the new stable release
+of skiboot following the 5.7 release, first released 25th July 2017.
+
+skiboot v5.8-rc1 contains all bug fixes as of :ref:`skiboot-5.4.6`
+and :ref:`skiboot-5.1.20` (the currently maintained stable releases). We
+do not currently expect to do any 5.7.x stable releases.
+
+For how the skiboot stable releases work, see :ref:`stable-rules` for details.
+
+The current plan is to cut the final 5.8 by August 25th, with skiboot 5.8
+being for all POWER8 and POWER9 platforms in op-build v1.19 (Due August 25th).
+This is a short cycle as this release is mainly targetted towards POWER9
+bringup efforts.
+
+Over skiboot-5.7, we have the following changes:
+
+New Features
+------------
+- sensors: occ: Add support to clear sensor groups
+
+  Adds a generic API to clear sensor groups. OCC inband sensor groups
+  such as CSM, Profiler and Job Scheduler can be cleared using this API.
+  It will clear the min/max of all sensors belonging to OCC sensor
+  groups.
+
+- sensors: occ: Add CSM_{min/max} sensors
+
+  HWMON's lowest/highest attribute is used by CSM agent, so map min/max
+  device-tree properties "sensor-data-min" and "sensor-data-max" to
+  the min/max of CSM.
+
+- sensors: occ: Add support for OCC inband sensors
+
+  Add support to parse and export OCC inband sensors which are copied
+  by OCC to main memory in P9. Each OCC writes three buffers which
+  includes one names buffer for sensor meta data and two buffers for
+  sensor readings. While OCC writes to one buffer the sensor values
+  can be read from the other buffer. The sensors are updated every
+  100ms.
+
+  This patch adds power, temperature, current and voltage sensors to
+  ``/ibm,opal/sensors`` device-tree node which can be exported by the
+  ibmpowernv-hwmon driver in Linux.
+
+- psr: occ: Add support to change power-shifting-ratio
+
+  Add support to set the CPU-GPU power shifting ratio which is used by
+  the OCC power capping algorithm. PSR value of 100 takes all power away
+  from CPU first and a PSR value of 0 caps GPU first.
+
+- powercap: occ: Add a generic powercap framework
+
+  This patch adds a generic powercap framework and exports OCC powercap
+  sensors using which system powercap can be set inband through OPAL-OCC
+  command-response interface.
+- phb4: Enable PCI peer-to-peer
+
+  P9 supports PCI peer-to-peer: a PCI device can write directly to the
+  mmio space of another PCI device. It completely by-passes the CPU.
+
+  It requires some configuration on the PHBs involved:
+
+  1. on the initiating side, the address for the read/write operation is
+     in the mmio space of the target, i.e. well outside the range normally
+     allowed. So we disable range-checking on the TVT entry in bypass mode.
+
+  2. on the target side, we need to explicitly enable p2p by setting a
+     bit in a configuration register. It has the side-effect of reserving
+     an outbound (as seen from the CPU) store queue for p2p. Therefore we
+     only enable p2p on the PHBs using it, as we don't want to waste the
+     resource if we don't have to.
+
+  P9 supports p2p mmio writes. Reads are currently only supported if the
+  two devices are under the same PHB but that is expected to change in
+  the future, and it raises questions about intermediate switches
+  configuration, so we report an error for the time being.
+
+  The patch adds a new OPAL call to allow the OS to declare a p2p
+  (initiator, target) pair.
+
+- NX 842 and GZIP support on POWER9
+
+
+POWER9 DD2
+----------
+
+Further support for POWER9 DD2 revision chips. Notable changes include:
+
+- xscom: Grab P9 DD2 revision level
+- vas: Set mmio enable bits in DD2
+
+  POWER9 DD2 added some new "enable" bits that must be set for VAS to
+  work. These bits were unused in DD1.
+- hdat: Add POWER9 DD2.0 specific pa_features
+
+  Same as the default but with TM off.
+
+POWER9
+------
+- Base NPU2 support on POWER9 DD2
+- hdata/i2c: Work around broken I2C array version
+
+  Work around a bug in the I2C devices array that shows the
+  array version as being v2 when only the v1 data is populated.
+- Recognize the 2s2u zz platform
+
+  OPAL currently doesn't know about the 2s2u zz. It recognizes such a
+  box as a generic BMC machine and fails to boot. Add the 2s2u as a
+  supported platform.
+
+  There will subsequently be a 2s2u-L system which may have a different
+  compatible property, which will need to be handled later.
+- hdata/spira: POWER9 NX isn't software compatible with P7/P8 NX, don't claim so
+- NX: Add P9 NX support for gzip compression engine
+
+  Power 9 introduces NX gzip compression engine. This patch adds gzip
+  compression support in NX. Virtual Accelerator Switch (VAS) is used to
+  access NX gzip engine and the channel configuration will be done with
+  the receive FIFO. So RxFIFO address, logical partition ID (lpid),
+  process ID (pid) and thread ID (tid) are used to configure RxFIFO.
+  P9 NX supports high and normal priority FIFOS. Skiboot configures User
+  Mode Access Control (UMAC) noitify match register with these values and
+  also enables other registers to enable / disable the engine.
+
+  Creates the following device-tree entries to provide RxFIFO address,
+  RxFIFO size, Fifo priority, lpid, pid and tid values so that kernel
+  can drive P9 NX gzip engine.
+
+  The following nodes are located under an xscom node: ::
+       /xscom@<xscom_addr>/nx@<nx_addr>
+
+       /ibm,gzip-high-fifo          : High priority gzip RxFIFO
+       /ibm,gzip-normal-fifo        : Normal priority gzip RxFIFO
+
+    Each RxFIFO node contain:s
+
+    ``compatible``
+      ``ibm,p9-nx-gzip``
+    ``priority``
+      High or Normal
+    ``rx-fifo-address``
+      RxFIFO address
+    ``rx-fifo-size``
+      RxFIFO size
+    ``lpid``
+      0xfff (1's for 12 bits in UMAC notify match register)
+    ``pid``
+      gzip coprocessor type
+    ``tid``
+      counter for gzip
+
+- NX: Add P9 NX support for 842 compression engine
+
+  This patch adds changes needed for 842 compression engine on power 9.
+  Virtual Accelerator Switch (VAS) is used to access NX 842 engine on P9
+  and the channel setup will be done with receive FIFO. So RxFIFO
+  address, logical partition ID (lpid), process ID (pid) and thread ID
+  (tid) are used for this setup. p9 NX supports high and normal priority
+  FIFOs. skiboot is not involved to process data with 842 engine, but
+  configures User Mode Access Control (UMAC) noitify match register with
+  these values and export them to kernel with device-tree entries.
+
+  Also configure registers to setup and enable / disable the engine with
+  the appropriate registers. Creates the following device-tree entries to
+  provide RxFIFO address, RxFIFO size, Fifo priority, lpid, pid and tid
+  values so that kernel can drive P9 NX 842 engine.
+
+    The following nodes are located under an xscom node:
+    ``/xscom@<xscom_addr>/nx@<nx_addr>``
+
+    ``/ibm,842-high-fifo``
+      High priority 842 RxFIFO
+    ``/ibm,842-normal-fifo``
+      Normal priority 842 RxFIFO
+
+    Each RxFIFO node contains:
+
+    ``compatible``
+      ibm,p9-nx-842
+    ``priority``
+      High or Normal
+    ``rx-fifo-address``
+      RxFIFO address
+    ``rx-fifo-size``
+      RXFIFO size
+    ``lpid``
+      0xfff (1's for 12 bits set in UMAC notify match register)
+    ``pid``
+      842 coprocessor type
+    ``tid``
+      Counter for 842
+- vas: Create MMIO device tree node
+
+  Create a device tree node for VAS and add properties that Linux
+  will need to configure/use VAS.
+- opal: Extract sw checkstop fir address from HDAT.
+
+  Extract sw checkstop fir address info from HDAT and populate device tree
+  node ibm,sw-checkstop-fir.
+
+  This patch is required for OPAL_CEC_REBOOT2 OPAL call to work as expected
+  on p9.
+
+  With this patch a device property 'ibm,sw-checkstop-fir' is now properly
+  populated: ::
+
+    # lsprop ibm,sw-checkstop-fir
+    ibm,sw-checkstop-fir
+                     05012000 0000001f
+
+PHB4
+----
+- hdat: Fix PCIe GEN4 lane-eq setting for DD2
+
+  For PCIe GEN4, DD2 uses only 1 byte per PCIe lane for the lane-eq
+  settings (DD1 uses 2 bytes)
+- pci: Wait for CRS and switch link when restoring bus numbers
+
+  When a complete reset occurs, after the PHB recovers it propagates a
+  reset down the wire to every device.  At the same time, skiboot talks to
+  every device in order to restore the state of devices to what they were
+  before the reset.
+
+  In some situations, such as devices that recovered slowly and/or were
+  behind a switch, skiboot attempted to access config space of the device
+  before the link was up and the device could respond.
+
+  Fix this by retrying CRS until the device responds correctly, and for
+  devices behind a switch, making sure the switch has its link up first.
+- pci: Track whether a PCI device is a virtual function
+
+  This can be checked from config space, but we will need to know this when
+  restoring the PCI topology, and it is not always safe to access config
+  space during this period.
+- phb4: Enhanced PCIe training tracing
+
+  This add more details to the PCI training tracing (aka Rick Mata
+  mode). It enables the PCIe Link Training and Status State
+  Machine (LTSSM) tracing and details on speed and link width.
+
+  Output now looks like this when enabled (via nvram): ::
+
+    [    1.096995141,3] PHB#0000[0:0]: TRACE:0x0000001101000000  0ms          GEN1:x16:detect
+    [    1.102849137,3] PHB#0000[0:0]: TRACE:0x0000102101000000 11ms presence GEN1:x16:polling
+    [    1.104341838,3] PHB#0000[0:0]: TRACE:0x0000182101000000 14ms training GEN1:x16:polling
+    [    1.104357444,3] PHB#0000[0:0]: TRACE:0x00001c5101000000 14ms training GEN1:x16:recovery
+    [    1.104580394,3] PHB#0000[0:0]: TRACE:0x00001c5103000000 14ms training GEN3:x16:recovery
+    [    1.123259359,3] PHB#0000[0:0]: TRACE:0x00001c5104000000 51ms training GEN4:x16:recovery
+    [    1.141737656,3] PHB#0000[0:0]: TRACE:0x0000144104000000 87ms presence GEN4:x16:L0
+    [    1.141752318,3] PHB#0000[0:0]: TRACE:0x0000154904000000 87ms trained  GEN4:x16:L0
+    [    1.141757964,3] PHB#0000[0:0]: TRACE: Link trained.
+    [    1.096834019,3] PHB#0001[0:1]: TRACE:0x0000001101000000  0ms          GEN1:x16:detect
+    [    1.105578525,3] PHB#0001[0:1]: TRACE:0x0000102101000000 17ms presence GEN1:x16:polling
+    [    1.112763075,3] PHB#0001[0:1]: TRACE:0x0000183101000000 31ms training GEN1:x16:config
+    [    1.112778956,3] PHB#0001[0:1]: TRACE:0x00001c5081000000 31ms training GEN1:x08:recovery
+    [    1.113002083,3] PHB#0001[0:1]: TRACE:0x00001c5083000000 31ms training GEN3:x08:recovery
+    [    1.114833873,3] PHB#0001[0:1]: TRACE:0x0000144083000000 35ms presence GEN3:x08:L0
+    [    1.114848832,3] PHB#0001[0:1]: TRACE:0x0000154883000000 35ms trained  GEN3:x08:L0
+    [    1.114854650,3] PHB#0001[0:1]: TRACE: Link trained.
+
+- phb4: Fix reading wrong size registers in EEH dump
+
+  These registers are supposed to be 16bit, and it makes part of the
+  register dump misleading.
+- phb4: Ignore slot state if performing complete reset
+
+  If a PHB is being completely reset, its state is about to be blown away
+  anyway, so if it's not in an appropriate state, creset it regardless.
+- phb4: Prepare for link down when creset called from kernel
+
+  phb4_creset() is typically called by functions that prepare the link
+  to go down.  In cases where creset() is called directly by the kernel,
+  this isn't the case and it can cause issues.  Prepare for link down in
+  creset, just like we do in freset and hreset.
+- phb4: Skip attempting to fix PHBs broken on boot
+
+  If a PHB is marked broken it didn't work on boot, and if it didn't work
+  on boot then there's no point trying to recover it later
+- phb4: Fix duplicate in EEH register dump
+- phb4: Be more conservative on link presence timeout
+
+  In this patch we tuned our link timing to be more agressive:
+  ``cf960e2884 phb4: Improve reset and link training timing``
+
+  Cards should take only 32ms but unfortunately we've seen some take
+  up to 440ms. Hence bump our timer up to 1000ms.
+
+  This can hurt boot times on systems where slots indicate a hotplug
+  status but no electrical link is present (which we've seen). Since we
+  have to wait 1 second between PERST and touching config space anyway,
+  it shouldn't hurt too much.
+- phb4: Assert PERST before PHB reset
+
+  Currently we don't assert PERST before issuing a PHB reset. This means
+  any link issues while resetting the PHB will be logged as errors.
+
+  This asserts PERST before we start resetting the PHB to avoid this.
+- Revert "phb4: Read PERST signal rather than assuming it's asserted"
+
+  This reverts commit b42ff2b904165addf32e77679cebb94a08086966
+
+  The original patch assumes that PERST has been asserted well before (>
+  250ms) we hit here (ie. during hostboot).
+
+  In a subesquent patch this will no longer be the case as we need to
+  assert PERST during PHB reset, which may only be a few milliseconds
+  before we hit this code.
+
+  Hence revert this patch. Go back to the software mechanism using
+  skip_perst to determine if PERST should be asserted or not. This
+  allows us to keep the speed optimisation on boot.
+- phb4: Set REGB error enables based on link state
+
+  Currently we always set these enables when initing the PHB. If the
+  link is already down, we shouldn't set them as it may cause spurious
+  errors.
+
+  This changes the code to only sets them if the link is up.
+- phb4: Mark PHB as fenced on creset
+
+  If we have to inject an error to trigger recover, we end up not
+  marking the PHB as fenced in the PHB struct. This fixes that.
+- phb4: Clear errors before deasserting reset
+
+  During reset we may have logged some errors (eg. due to the link going
+  down).
+
+  Hence before we deassert PERST or Hot Reset, we need to clear these
+  errors. This ensures that once link training starts, only new errors
+  are logged.
+- phb4: Disable device config space access when fenced
+
+  On DD2 you can't access device config space when fenced, so just
+  disable access whenever we are fenced.
+- phb4: Dump devctl and devstat registers
+
+  Dump devctl and devstat registers.  These would have been useful when
+  debugging the MPS issue.
+- phb4: Only clear some PHB config space registers on errors
+
+  Currently on error we clear the entire PHB config space.  This is a
+  problem as the PCIe Maximum Payload Size (MPS) negotiation may have
+  already occurred. Clearing MPS in the PHB back to a default of 128
+  bytes will result an error for a device which already has a larger MPS
+  configured.
+
+  This will manifest itself as error due to a malformed TLP packet. ie.
+  ``phbPblErrorStatus bit 41  = "Malformed TLP error"``
+
+  This has been seen after kexec on with some adapters.
+
+  This fixes the problem by only clearing a subset of registers on a phb
+  error.
+
+Utilities
+---------
+- external/xscom-utils: Add ``--list-bits``
+
+  When using getscom/putscom it's helpful to know what bits are set in the
+  register. This patch adds an option to print out which bits are set
+  along with the value that was read/written to the register. Note that
+  this output indicates which bits are set using the IBM bit ordering
+  since that's what the XSCOM documentation uses.
+
+
+opal-prd
+--------
+
+- opal-prd: Do not pass pnor file while starting daemon.
+
+  This change to the included systemd init file means opal-prd can
+  start and run on IBM FSP based systems.
+
+  We do not have pnor support on all the system. Also we have logic to
+  autodetect PNOR. Hence do not pass ``--pnor`` by default.
+
+- opal-prd: Disable pnor access interface on FSP system
+
+  On FSP system host does not have access to PNOR. Hence disable PNOR
+  access interfaces.
+
+OPAL Sensors
+------------
+- sensor-groups : occ: Add 'ops' DT property
+
+  Add new device-tree property 'ops' to define different operations
+  supported on each sensor-group.
+
+- OCC: Map OCC sensor to a chip-id
+
+  Parse device tree to get chip-id for OCC sensor.
+
+- HDAT: Add chip-id property to ipmi sensors
+
+  Presently we do not have a way to map sensor to chip id. Hence we are
+  always passing chip id 0 for occ_reset request (see occ_sensor_id_to_chip()).
+
+  This patch adds chip-id property to sensors (whenever its available) so that
+  we can map occ sensor to chip-id and pass valid chip-id to occ_reset request.
+
+- xive: Check for valid PIR index when decoding
+
+  This fixes an unlikely but possible assert() fail on kdump.
+
+- sensors: occ: Skip the deconfigured core sensors
+
+  This patch skips the deconfigured cores from the core sensors while
+  parsing the sensor names in the main memory as these sensor values are
+  not updated by OCC.
+
+Tests
+-----
+- hdata_to_dt: use a realistic PVR and chip revision
+
+- nx: PR_INFO that NX RNG and Crypto not yet supported on POWER9
+
+- external/pflash: Add tests
+- external/pflash: Reinstate the progress bars
+
+  Recent work did some optimising which unfortunately removed some of the
+  progress bars in pflash.
+
+  It turns out that there's only one thing people prefer to correctly
+  programmed flash chips, it is the ability to watch little equals
+  characters go across their screens for potentially minutes.
+- external/pflash: Correct erase alignment checks
+
+  pflash should check the alignment of addresses and sizes when asked to
+  erase. There are two possibilities:
+
+  1. The user has specified sizes manually in which case pflash should
+     be as flexible as possible, blocklevel_smart_erase() permits this. To
+     prevent possible mistakes pflash will require --force to perform a
+     manual erase of unaligned sizes.
+  2. The user used -P to specify a partition, partitions aren't
+     necessarily erase granule aligned anymore, blocklevel_smart_erase() can
+     handle. In this it doesn't make sense to warn/error about misalignment
+     since the misalignment is inherent to the FFS partition and not really
+     user input.
+
+- external/pflash: Check the result of strtoul
+
+  Also add 0x in front of --info output to avoid a copy and paste mistake.
+
+- libflash/file: Break up MTD erase ioctl() calls
+
+  Unfortunately not all drivers are created equal and several drivers on
+  which pflash relies block in the kernel for quite some time and ignore
+  signals.
+
+  This is really only a problem if pflash is to perform large erases. So
+  don't, perform these ops in small chunks.
+
+  An in kernel fix is possible in most cases but it takes time and systems
+  will be running older drivers for quite some time. Since sector erases
+  aren't significantly slower than whole chip erases there isn't much of a
+  performance penalty to breaking up the erase ioctl()s.
+
+General
+-------
+- opal-msg: Increase the max-async completion count by max chips possible
+
+- occ: Add support for OPAL-OCC command/response interface
+
+  This patch adds support for a shared memory based command/response
+  interface between OCC and OPAL. In HOMER, there is an OPAL command
+  buffer and an OCC response buffer which is used to send inband
+  commands to OCC.
+
+- HDAT/device-tree: only add lid-type on pre-POWER9 systems
+
+  Largely a relic of back when we had multiple entry points into OPAL depending
+  on which mechanism on an FSP we were using to get loaded, this isn't needed
+  on modern P9 as we only have one entry point (we don't do the PHYP LID hack).