doc/release-notes/skiboot-5.9-rc2.rst - skiboot - Git at Google

 .. _skiboot-5.9-rc2:

 skiboot-5.9-rc2
 ===============

 skiboot v5.9-rc2 was released on Monday October 16th 2017. It is the second
 release candidate of skiboot 5.9, which will become the new stable release
 of skiboot following the 5.8 release, first released August 31st 2017.

 skiboot v5.9-rc2 contains all bug fixes as of :ref:`skiboot-5.4.8`
 and :ref:`skiboot-5.1.21` (the currently maintained stable releases). We
 do not currently expect to do any 5.8.x stable releases.

 For how the skiboot stable releases work, see :ref:`stable-rules` for details.

 The current plan is to cut the final 5.9 by October 17th, with skiboot 5.9
 being for all POWER8 and POWER9 platforms in op-build v1.20 (Due October 18th).
 This release will be targetted to early POWER9 systems.

 Over :ref:`skiboot-5.9-rc1`, we have the following changes:

 - opal-prd: Fix memory leak
 - hdata/i2c: update the list of known i2c devs

   This updates the list of known i2c devices - as of HDAT spec v10.5e - so
   that they can be properly identified during the hdat parsing.
 - hdata/i2c: log unknown i2c devices

   An i2c device is unknown if either the i2c device list is outdated or
   the device is marked as unknown (0xFF) in the hdat.

 - opal/cpu: Mark the core as bad while disabling threads of the core.

   If any of the core fails to sync its TB during chipTOD initialization,
   all the threads of that core are disabled. But this does not make
   linux kernel to ignore the core/cpus. It crashes while bringing them up
   with below backtrace: ::

     [   38.883898] kexec_core: Starting new kernel
     cpu 0x0: Vector: 300 (Data Access) at [c0000003f277b730]
         pc: c0000000001b9890: internal_create_group+0x30/0x304
         lr: c0000000001b9880: internal_create_group+0x20/0x304
         sp: c0000003f277b9b0
        msr: 900000000280b033
        dar: 40
      dsisr: 40000000
       current = 0xc0000003f9f41000
       paca    = 0xc00000000fe00000   softe: 0        irq_happened: 0x01
         pid   = 2572, comm = kexec
     Linux version 4.13.2-openpower1 (jenkins@p89) (gcc version 6.4.0 (Buildroot 2017.08-00006-g319c6e1)) #1 SMP Wed Sep 20 05:42:11 UTC 2017
     enter ? for help
     [c0000003f277b9b0] c0000000008a8780 (unreliable)
     [c0000003f277ba50] c00000000041c3ac topology_add_dev+0x2c/0x40
     [c0000003f277ba70] c00000000006b078 cpuhp_invoke_callback+0x88/0x170
     [c0000003f277bac0] c00000000006b22c cpuhp_up_callbacks+0x54/0xb8
     [c0000003f277bb10] c00000000006bc68 cpu_up+0x11c/0x168
     [c0000003f277bbc0] c00000000002f0e0 default_machine_kexec+0x1fc/0x274
     [c0000003f277bc50] c00000000002e2d8 machine_kexec+0x50/0x58
     [c0000003f277bc70] c0000000000de4e8 kernel_kexec+0x98/0xb4
     [c0000003f277bce0] c00000000008b0f0 SyS_reboot+0x1c8/0x1f4
     [c0000003f277be30] c00000000000b118 system_call+0x58/0x6c

 - hw/imc: pause microcode at boot

   IMC nest counters has both in-band (ucode access) and out of
   band access to it. Since not all nest counter configurations
   are supported by ucode, out of band tools are used to characterize
   other configuration.

   So it is prefer to pause the nest microcode at boot to aid the
   nest out of band tools. If the ucode not paused and OS does not
   have IMC driver support, then out to band tools will race with
   ucode and end up getting undesirable values. Patch to check and
   pause the ucode at boot.

   OPAL provides APIs to control IMC counters. OPAL_IMC_COUNTERS_INIT
   is used to initialize these counters at boot. OPAL_IMC_COUNTERS_START
   and OPAL_IMC_COUNTERS_STOP API calls should be used to start and pause
   these IMC engines. `doc/opal-api/opal-imc-counters.rst` details the
   OPAL APIs and their usage.
 - xive: Fix VP free block group mode false-positive parameter check

   The check to ensure the buddy allocation idx is aligned to its
   allocation order was not taking into account the allocation split.
   This would result in opal_xive_free_vp_block failures despite
   giving the same value as returned by opal_xive_alloc_vp_block.

   E.g., starting then stopping 4 KVM guests gives the following pattern
   in the host: ::

       opal_xive_alloc_vp_block(5)=0x45000020
       opal_xive_alloc_vp_block(5)=0x45000040
       opal_xive_alloc_vp_block(5)=0x45000060
       opal_xive_alloc_vp_block(5)=0x45000080
       opal_xive_free_vp_block(0x45000020)=-1
       opal_xive_free_vp_block(0x45000040)=0
       opal_xive_free_vp_block(0x45000060)=-1
       opal_xive_free_vp_block(0x45000080)=0
 - hw/p8-i2c: Fix deadlock in p9_i2c_bus_owner_change

   When debugging a system where Linux was taking soft lockup errors with
   two CPUs stuck in OPAL:

   ======================= ==============
   CPU0                    CPU1
   ======================= ==============
   lock
   p8_i2c_recover
   opal_handle_interrupt
                           sync_timer
 			  cancel_timer
 			  p9_i2c_bus_owner_change
 			  occ_p9_interrupt
 			  xive_source_interrupt
 			  opal_handle_interrupt
   ======================= ==============

   p8_i2c_recover() is a timer, and is stuck trying to take master->lock.
   p9_i2c_bus_owner_change() has taken master->lock, but then is stuck waiting
   for all timers to complete. We deadlock.

   Fix this by using cancel_timer_async().

 - FSP/CONSOLE: Limit number of error logging

   Commit c8a7535f (FSP/CONSOLE: Workaround for unresponsive ipmi daemon) added
   error logging when buffer is full. In some corner cases kernel may call this
   function multiple time and we may endup logging error again and again.

   This patch fixes it by generating error log only once.

 - FSP/CONSOLE: Fix fsp_console_write_buffer_space() call

   Kernel calls fsp_console_write_buffer_space() to check console buffer space
   availability. If there is enough buffer space to write data, then kernel will
   call fsp_console_write() to write actual data.

   In some extreme corner cases (like one explained in commit c8a7535f)
   console becomes full and this function returns 0 to kernel (or space available
   in console buffer < next incoming data size). Kernel will continue retrying
   until it gets enough space. So we will start seeing RCU stalls.

   This patch keeps track of previous available space. If previous space is same
   as current means not enough space in console buffer to write incoming data.
   It may be due to very high console write operation and slow response from FSP
   -OR- FSP has stopped processing data (ex: because of ipmi daemon died). At this
   point we will start timer with timeout of SER_BUFFER_OUT_TIMEOUT (10 secs).
   If situation is not improved within 10 seconds means something went bad. Lets
   return OPAL_RESOURCE so that kernel can drop console write and continue.
 - FSP/CONSOLE: Close SOL session during R/R

   Presently we are not closing SOL and FW console sessions during R/R. Host will
   continue to write to SOL buffer during FSP R/R. If there is heavy console write
   operation happening during FSP R/R (like running `top` command inside console),
   then at some point console buffer becomes full. fsp_console_write_buffer_space()
   returns 0 (or less than required space to write data) to host. While one thread
   is busy writing to console, if some other threads tries to write data to console
   we may see RCU stalls (like below) in kernel. ::

     [ 2082.828363] INFO: rcu_sched detected stalls on CPUs/tasks: { 32} (detected by 16, t=6002 jiffies, g=23154, c=23153, q=254769)
     [ 2082.828365] Task dump for CPU 32:
     [ 2082.828368] kworker/32:3    R  running task        0  4637      2 0x00000884
     [ 2082.828375] Workqueue: events dump_work_fn
     [ 2082.828376] Call Trace:
     [ 2082.828382] [c000000f1633fa00] [c00000000013b6b0] console_unlock+0x570/0x600 (unreliable)
     [ 2082.828384] [c000000f1633fae0] [c00000000013ba34] vprintk_emit+0x2f4/0x5c0
     [ 2082.828389] [c000000f1633fb60] [c00000000099e644] printk+0x84/0x98
     [ 2082.828391] [c000000f1633fb90] [c0000000000851a8] dump_work_fn+0x238/0x250
     [ 2082.828394] [c000000f1633fc60] [c0000000000ecb98] process_one_work+0x198/0x4b0
     [ 2082.828396] [c000000f1633fcf0] [c0000000000ed3dc] worker_thread+0x18c/0x5a0
     [ 2082.828399] [c000000f1633fd80] [c0000000000f4650] kthread+0x110/0x130
     [ 2082.828403] [c000000f1633fe30] [c000000000009674] ret_from_kernel_thread+0x5c/0x68

   Hence lets close SOL (and FW console) during FSP R/R.
 - FSP/CONSOLE: Do not associate unavailable console

   Presently OPAL sends associate/unassociate MBOX command for all
   FSP serial console (like below OPAL message). We have to check
   console is available or not before sending this message. ::

     [ 5013.227994012,7] FSP: Reassociating HVSI console 1
     [ 5013.227997540,7] FSP: Reassociating HVSI console 2
 - FSP: Disable PSI link whenever FSP tells OPAL about impending R/R

   Commit 42d5d047 fixed scenario where DPO has been initiated, but FSP went
   into reset before the CEC power down came in. But this is generic issue
   that can happen in normal shutdown path as well.

   Hence disable PSI link as soon as we detect FSP impending R/R.

 - fsp: return OPAL_BUSY_EVENT on failure sending FSP_CMD_POWERDOWN_NORM
   Also, return OPAL_BUSY_EVENT on failure sending FSP_CMD_REBOOT / DEEP_REBOOT.

   We had a race condition between FSP Reset/Reload and powering down
   the system from the host:

   Roughly:

   == ======================== ==========================================================
   #  FSP                      Host
   == ======================== ==========================================================
   1  Power on
   2                           Power on
   3  (inject EPOW)
   4  (trigger FSP R/R)
   5                           Processes EPOW event, starts shutting down
   6                           calls OPAL_CEC_POWER_DOWN
   7  (is still in R/R)
   8                           gets OPAL_INTERNAL_ERROR, spins in opal_poll_events
   9  (FSP comes back)
   10                          spinning in opal_poll_events
   11 (thinks host is running)
   == ======================== ==========================================================

   The call to OPAL_CEC_POWER_DOWN is only made once as the reset/reload
   error path for fsp_sync_msg() is to return -1, which means we give
   the OS OPAL_INTERNAL_ERROR, which is fine, except that our own API
   docs give us the opportunity to return OPAL_BUSY when trying again
   later may be successful, and we're ambiguous as to if you should retry
   on OPAL_INTERNAL_ERROR.

   For reference, the linux code looks like this: ::

     static void __noreturn pnv_power_off(void)
     {
             long rc = OPAL_BUSY;

             pnv_prepare_going_down();

             while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
                     rc = opal_cec_power_down(0);
                     if (rc == OPAL_BUSY_EVENT)
                             opal_poll_events(NULL);
                     else
                             mdelay(10);
             }
             for (;;)
                     opal_poll_events(NULL);
     }

   Which means that *practically* our only option is to return OPAL_BUSY
   or OPAL_BUSY_EVENT.

   We choose OPAL_BUSY_EVENT for FSP systems as we do want to ensure we're
   running pollers to communicate with the FSP and do the final bits of
   Reset/Reload handling before we power off the system.
	.. _skiboot-5.9-rc2:

	skiboot-5.9-rc2
	===============

	skiboot v5.9-rc2 was released on Monday October 16th 2017. It is the second
	release candidate of skiboot 5.9, which will become the new stable release
	of skiboot following the 5.8 release, first released August 31st 2017.

	skiboot v5.9-rc2 contains all bug fixes as of :ref:`skiboot-5.4.8`
	and :ref:`skiboot-5.1.21` (the currently maintained stable releases). We
	do not currently expect to do any 5.8.x stable releases.

	For how the skiboot stable releases work, see :ref:`stable-rules` for details.

	The current plan is to cut the final 5.9 by October 17th, with skiboot 5.9
	being for all POWER8 and POWER9 platforms in op-build v1.20 (Due October 18th).
	This release will be targetted to early POWER9 systems.

	Over :ref:`skiboot-5.9-rc1`, we have the following changes:

	- opal-prd: Fix memory leak
	- hdata/i2c: update the list of known i2c devs

	This updates the list of known i2c devices - as of HDAT spec v10.5e - so
	that they can be properly identified during the hdat parsing.
	- hdata/i2c: log unknown i2c devices

	An i2c device is unknown if either the i2c device list is outdated or
	the device is marked as unknown (0xFF) in the hdat.

	- opal/cpu: Mark the core as bad while disabling threads of the core.

	If any of the core fails to sync its TB during chipTOD initialization,
	all the threads of that core are disabled. But this does not make
	linux kernel to ignore the core/cpus. It crashes while bringing them up
	with below backtrace: ::

	[ 38.883898] kexec_core: Starting new kernel
	cpu 0x0: Vector: 300 (Data Access) at [c0000003f277b730]
	pc: c0000000001b9890: internal_create_group+0x30/0x304
	lr: c0000000001b9880: internal_create_group+0x20/0x304
	sp: c0000003f277b9b0
	msr: 900000000280b033
	dar: 40
	dsisr: 40000000
	current = 0xc0000003f9f41000
	paca = 0xc00000000fe00000 softe: 0 irq_happened: 0x01
	pid = 2572, comm = kexec
	Linux version 4.13.2-openpower1 (jenkins@p89) (gcc version 6.4.0 (Buildroot 2017.08-00006-g319c6e1)) #1 SMP Wed Sep 20 05:42:11 UTC 2017
	enter ? for help
	[c0000003f277b9b0] c0000000008a8780 (unreliable)
	[c0000003f277ba50] c00000000041c3ac topology_add_dev+0x2c/0x40
	[c0000003f277ba70] c00000000006b078 cpuhp_invoke_callback+0x88/0x170
	[c0000003f277bac0] c00000000006b22c cpuhp_up_callbacks+0x54/0xb8
	[c0000003f277bb10] c00000000006bc68 cpu_up+0x11c/0x168
	[c0000003f277bbc0] c00000000002f0e0 default_machine_kexec+0x1fc/0x274
	[c0000003f277bc50] c00000000002e2d8 machine_kexec+0x50/0x58
	[c0000003f277bc70] c0000000000de4e8 kernel_kexec+0x98/0xb4
	[c0000003f277bce0] c00000000008b0f0 SyS_reboot+0x1c8/0x1f4
	[c0000003f277be30] c00000000000b118 system_call+0x58/0x6c

	- hw/imc: pause microcode at boot

	IMC nest counters has both in-band (ucode access) and out of
	band access to it. Since not all nest counter configurations
	are supported by ucode, out of band tools are used to characterize
	other configuration.

	So it is prefer to pause the nest microcode at boot to aid the
	nest out of band tools. If the ucode not paused and OS does not
	have IMC driver support, then out to band tools will race with
	ucode and end up getting undesirable values. Patch to check and
	pause the ucode at boot.

	OPAL provides APIs to control IMC counters. OPAL_IMC_COUNTERS_INIT
	is used to initialize these counters at boot. OPAL_IMC_COUNTERS_START
	and OPAL_IMC_COUNTERS_STOP API calls should be used to start and pause
	these IMC engines. `doc/opal-api/opal-imc-counters.rst` details the
	OPAL APIs and their usage.
	- xive: Fix VP free block group mode false-positive parameter check

	The check to ensure the buddy allocation idx is aligned to its
	allocation order was not taking into account the allocation split.
	This would result in opal_xive_free_vp_block failures despite
	giving the same value as returned by opal_xive_alloc_vp_block.

	E.g., starting then stopping 4 KVM guests gives the following pattern
	in the host: ::

	opal_xive_alloc_vp_block(5)=0x45000020
	opal_xive_alloc_vp_block(5)=0x45000040
	opal_xive_alloc_vp_block(5)=0x45000060
	opal_xive_alloc_vp_block(5)=0x45000080
	opal_xive_free_vp_block(0x45000020)=-1
	opal_xive_free_vp_block(0x45000040)=0
	opal_xive_free_vp_block(0x45000060)=-1
	opal_xive_free_vp_block(0x45000080)=0
	- hw/p8-i2c: Fix deadlock in p9_i2c_bus_owner_change

	When debugging a system where Linux was taking soft lockup errors with
	two CPUs stuck in OPAL:

	======================= ==============
	CPU0 CPU1
	======================= ==============
	lock
	p8_i2c_recover
	opal_handle_interrupt
	sync_timer
	cancel_timer
	p9_i2c_bus_owner_change
	occ_p9_interrupt
	xive_source_interrupt
	opal_handle_interrupt
	======================= ==============

	p8_i2c_recover() is a timer, and is stuck trying to take master->lock.
	p9_i2c_bus_owner_change() has taken master->lock, but then is stuck waiting
	for all timers to complete. We deadlock.

	Fix this by using cancel_timer_async().

	- FSP/CONSOLE: Limit number of error logging

	Commit c8a7535f (FSP/CONSOLE: Workaround for unresponsive ipmi daemon) added
	error logging when buffer is full. In some corner cases kernel may call this
	function multiple time and we may endup logging error again and again.

	This patch fixes it by generating error log only once.

	- FSP/CONSOLE: Fix fsp_console_write_buffer_space() call

	Kernel calls fsp_console_write_buffer_space() to check console buffer space
	availability. If there is enough buffer space to write data, then kernel will
	call fsp_console_write() to write actual data.

	In some extreme corner cases (like one explained in commit c8a7535f)
	console becomes full and this function returns 0 to kernel (or space available
	in console buffer < next incoming data size). Kernel will continue retrying
	until it gets enough space. So we will start seeing RCU stalls.

	This patch keeps track of previous available space. If previous space is same
	as current means not enough space in console buffer to write incoming data.
	It may be due to very high console write operation and slow response from FSP
	-OR- FSP has stopped processing data (ex: because of ipmi daemon died). At this
	point we will start timer with timeout of SER_BUFFER_OUT_TIMEOUT (10 secs).
	If situation is not improved within 10 seconds means something went bad. Lets
	return OPAL_RESOURCE so that kernel can drop console write and continue.
	- FSP/CONSOLE: Close SOL session during R/R

	Presently we are not closing SOL and FW console sessions during R/R. Host will
	continue to write to SOL buffer during FSP R/R. If there is heavy console write
	operation happening during FSP R/R (like running `top` command inside console),
	then at some point console buffer becomes full. fsp_console_write_buffer_space()
	returns 0 (or less than required space to write data) to host. While one thread
	is busy writing to console, if some other threads tries to write data to console
	we may see RCU stalls (like below) in kernel. ::

	[ 2082.828363] INFO: rcu_sched detected stalls on CPUs/tasks: { 32} (detected by 16, t=6002 jiffies, g=23154, c=23153, q=254769)
	[ 2082.828365] Task dump for CPU 32:
	[ 2082.828368] kworker/32:3 R running task 0 4637 2 0x00000884
	[ 2082.828375] Workqueue: events dump_work_fn
	[ 2082.828376] Call Trace:
	[ 2082.828382] [c000000f1633fa00] [c00000000013b6b0] console_unlock+0x570/0x600 (unreliable)
	[ 2082.828384] [c000000f1633fae0] [c00000000013ba34] vprintk_emit+0x2f4/0x5c0
	[ 2082.828389] [c000000f1633fb60] [c00000000099e644] printk+0x84/0x98
	[ 2082.828391] [c000000f1633fb90] [c0000000000851a8] dump_work_fn+0x238/0x250
	[ 2082.828394] [c000000f1633fc60] [c0000000000ecb98] process_one_work+0x198/0x4b0
	[ 2082.828396] [c000000f1633fcf0] [c0000000000ed3dc] worker_thread+0x18c/0x5a0
	[ 2082.828399] [c000000f1633fd80] [c0000000000f4650] kthread+0x110/0x130
	[ 2082.828403] [c000000f1633fe30] [c000000000009674] ret_from_kernel_thread+0x5c/0x68

	Hence lets close SOL (and FW console) during FSP R/R.
	- FSP/CONSOLE: Do not associate unavailable console

	Presently OPAL sends associate/unassociate MBOX command for all
	FSP serial console (like below OPAL message). We have to check
	console is available or not before sending this message. ::

	[ 5013.227994012,7] FSP: Reassociating HVSI console 1
	[ 5013.227997540,7] FSP: Reassociating HVSI console 2
	- FSP: Disable PSI link whenever FSP tells OPAL about impending R/R

	Commit 42d5d047 fixed scenario where DPO has been initiated, but FSP went
	into reset before the CEC power down came in. But this is generic issue
	that can happen in normal shutdown path as well.

	Hence disable PSI link as soon as we detect FSP impending R/R.

	- fsp: return OPAL_BUSY_EVENT on failure sending FSP_CMD_POWERDOWN_NORM
	Also, return OPAL_BUSY_EVENT on failure sending FSP_CMD_REBOOT / DEEP_REBOOT.

	We had a race condition between FSP Reset/Reload and powering down
	the system from the host:

	Roughly:

	== ======================== ==========================================================
	# FSP Host
	== ======================== ==========================================================
	1 Power on
	2 Power on
	3 (inject EPOW)
	4 (trigger FSP R/R)
	5 Processes EPOW event, starts shutting down
	6 calls OPAL_CEC_POWER_DOWN
	7 (is still in R/R)
	8 gets OPAL_INTERNAL_ERROR, spins in opal_poll_events
	9 (FSP comes back)
	10 spinning in opal_poll_events
	11 (thinks host is running)
	== ======================== ==========================================================

	The call to OPAL_CEC_POWER_DOWN is only made once as the reset/reload
	error path for fsp_sync_msg() is to return -1, which means we give
	the OS OPAL_INTERNAL_ERROR, which is fine, except that our own API
	docs give us the opportunity to return OPAL_BUSY when trying again
	later may be successful, and we're ambiguous as to if you should retry
	on OPAL_INTERNAL_ERROR.

	For reference, the linux code looks like this: ::

	static void __noreturn pnv_power_off(void)
	{
	long rc = OPAL_BUSY;

	pnv_prepare_going_down();

	while (rc == OPAL_BUSY \|\| rc == OPAL_BUSY_EVENT) {
	rc = opal_cec_power_down(0);
	if (rc == OPAL_BUSY_EVENT)
	opal_poll_events(NULL);
	else
	mdelay(10);
	}
	for (;;)
	opal_poll_events(NULL);
	}

	Which means that practically our only option is to return OPAL_BUSY
	or OPAL_BUSY_EVENT.

	We choose OPAL_BUSY_EVENT for FSP systems as we do want to ensure we're
	running pollers to communicate with the FSP and do the final bits of
	Reset/Reload handling before we power off the system.