blob: c45e2170386889bf3b0a0fb834fb1740f732cfad [file] [log] [blame]
<!DOCTYPE html>
<html lang="en" data-content_root="../">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
<title>skiboot-5.4.8 &#8212; skiboot d365a01
documentation</title>
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=fa44fd50" />
<link rel="stylesheet" type="text/css" href="../_static/classic.css?v=514cf933" />
<script src="../_static/documentation_options.js?v=e1fecbe9"></script>
<script src="../_static/doctools.js?v=888ff710"></script>
<script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="skiboot-5.4.9" href="skiboot-5.4.9.html" />
<link rel="prev" title="skiboot-5.4.7" href="skiboot-5.4.7.html" />
</head><body>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="../genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="skiboot-5.4.9.html" title="skiboot-5.4.9"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="skiboot-5.4.7.html" title="skiboot-5.4.7"
accesskey="P">previous</a> |</li>
<li class="nav-item nav-item-0"><a href="../index.html">skiboot d365a01
documentation</a> &#187;</li>
<li class="nav-item nav-item-1"><a href="index.html" accesskey="U">Release Notes</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">skiboot-5.4.8</a></li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<section id="skiboot-5-4-8">
<span id="id1"></span><h1>skiboot-5.4.8<a class="headerlink" href="#skiboot-5-4-8" title="Link to this heading"></a></h1>
<p>skiboot-5.4.8 was released on Wednesday October 11th, 2017. It replaces
<a class="reference internal" href="skiboot-5.4.7.html#skiboot-5-4-7"><span class="std std-ref">skiboot-5.4.7</span></a> as the current stable release in the 5.4.x series.</p>
<p>Over <a class="reference internal" href="skiboot-5.4.7.html#skiboot-5-4-7"><span class="std std-ref">skiboot-5.4.7</span></a>, we have a few bug fixes for FSP platforms:</p>
<ul>
<li><p>libflash/file: Handle short read()s and write()s correctly</p>
<p>Currently we don’t move the buffer along for a short read() or write()
and nor do we request only the remaining amount.</p>
</li>
<li><p>FSP/NVRAM: Handle “get vNVRAM statistics” command</p>
<p>FSP sends MBOX command (cmd : 0xEB, subcmd : 0x05, mod : 0x00) to get vNVRAM
statistics. OPAL doesn’t maintain any such statistics. Hence return
FSP_STATUS_INVALID_SUBCMD.</p>
<blockquote>
<div><p>Sample OPAL log:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span><span class="mf">16944.384670488</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">FSP</span><span class="p">:</span> <span class="n">Unhandled</span> <span class="n">message</span> <span class="n">eb0500</span>
<span class="p">[</span><span class="mf">16944.474110465</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">FSP</span><span class="p">:</span> <span class="n">Unhandled</span> <span class="n">message</span> <span class="n">eb0500</span>
<span class="p">[</span><span class="mf">16945.111280784</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">FSP</span><span class="p">:</span> <span class="n">Unhandled</span> <span class="n">message</span> <span class="n">eb0500</span>
<span class="p">[</span><span class="mf">16945.293393485</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">FSP</span><span class="p">:</span> <span class="n">Unhandled</span> <span class="n">message</span> <span class="n">eb0500</span>
</pre></div>
</div>
</div></blockquote>
</li>
<li><p>FSP/CONSOLE: Limit number of error logging</p>
<p>Commit c8a7535f (FSP/CONSOLE: Workaround for unresponsive ipmi daemon, added
in skiboot 5.4.6 and 5.7-rc1) added error logging when buffer is full. In some
corner cases kernel may call this function multiple time and we may endup logging
error again and again.</p>
<p>This patch fixes it by generating error log only once.</p>
</li>
<li><p>FSP/CONSOLE: Fix fsp_console_write_buffer_space() call</p>
<p>Kernel calls fsp_console_write_buffer_space() to check console buffer space
availability. If there is enough buffer space to write data, then kernel will
call fsp_console_write() to write actual data.</p>
<p>In some extreme corner cases (like one explained in commit c8a7535f)
console becomes full and this function returns 0 to kernel (or space available
in console buffer &lt; next incoming data size). Kernel will continue retrying
until it gets enough space. So we will start seeing RCU stalls.</p>
<p>This patch keeps track of previous available space. If previous space is same
as current means not enough space in console buffer to write incoming data.
It may be due to very high console write operation and slow response from FSP
-OR- FSP has stopped processing data (ex: because of ipmi daemon died). At this
point we will start timer with timeout of SER_BUFFER_OUT_TIMEOUT (10 secs).
If situation is not improved within 10 seconds means something went bad. Lets
return OPAL_RESOURCE so that kernel can drop console write and continue.</p>
</li>
<li><p>FSP/CONSOLE: Close SOL session during R/R</p>
<p>Presently we are not closing SOL and FW console sessions during R/R. Host will
continue to write to SOL buffer during FSP R/R. If there is heavy console write
operation happening during FSP R/R (like running <cite>top</cite> command inside console),
then at some point console buffer becomes full. fsp_console_write_buffer_space()
returns 0 (or less than required space to write data) to host. While one thread
is busy writing to console, if some other threads tries to write data to console
we may see RCU stalls (like below) in kernel.</p>
<p>kernel call trace:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span> <span class="mf">2082.828363</span><span class="p">]</span> <span class="n">INFO</span><span class="p">:</span> <span class="n">rcu_sched</span> <span class="n">detected</span> <span class="n">stalls</span> <span class="n">on</span> <span class="n">CPUs</span><span class="o">/</span><span class="n">tasks</span><span class="p">:</span> <span class="p">{</span> <span class="mi">32</span><span class="p">}</span> <span class="p">(</span><span class="n">detected</span> <span class="n">by</span> <span class="mi">16</span><span class="p">,</span> <span class="n">t</span><span class="o">=</span><span class="mi">6002</span> <span class="n">jiffies</span><span class="p">,</span> <span class="n">g</span><span class="o">=</span><span class="mi">23154</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="mi">23153</span><span class="p">,</span> <span class="n">q</span><span class="o">=</span><span class="mi">254769</span><span class="p">)</span>
<span class="p">[</span> <span class="mf">2082.828365</span><span class="p">]</span> <span class="n">Task</span> <span class="n">dump</span> <span class="k">for</span> <span class="n">CPU</span> <span class="mi">32</span><span class="p">:</span>
<span class="p">[</span> <span class="mf">2082.828368</span><span class="p">]</span> <span class="n">kworker</span><span class="o">/</span><span class="mi">32</span><span class="p">:</span><span class="mi">3</span> <span class="n">R</span> <span class="n">running</span> <span class="n">task</span> <span class="mi">0</span> <span class="mi">4637</span> <span class="mi">2</span> <span class="mh">0x00000884</span>
<span class="p">[</span> <span class="mf">2082.828375</span><span class="p">]</span> <span class="n">Workqueue</span><span class="p">:</span> <span class="n">events</span> <span class="n">dump_work_fn</span>
<span class="p">[</span> <span class="mf">2082.828376</span><span class="p">]</span> <span class="n">Call</span> <span class="n">Trace</span><span class="p">:</span>
<span class="p">[</span> <span class="mf">2082.828382</span><span class="p">]</span> <span class="p">[</span><span class="n">c000000f1633fa00</span><span class="p">]</span> <span class="p">[</span><span class="n">c00000000013b6b0</span><span class="p">]</span> <span class="n">console_unlock</span><span class="o">+</span><span class="mh">0x570</span><span class="o">/</span><span class="mh">0x600</span> <span class="p">(</span><span class="n">unreliable</span><span class="p">)</span>
<span class="p">[</span> <span class="mf">2082.828384</span><span class="p">]</span> <span class="p">[</span><span class="n">c000000f1633fae0</span><span class="p">]</span> <span class="p">[</span><span class="n">c00000000013ba34</span><span class="p">]</span> <span class="n">vprintk_emit</span><span class="o">+</span><span class="mh">0x2f4</span><span class="o">/</span><span class="mh">0x5c0</span>
<span class="p">[</span> <span class="mf">2082.828389</span><span class="p">]</span> <span class="p">[</span><span class="n">c000000f1633fb60</span><span class="p">]</span> <span class="p">[</span><span class="n">c00000000099e644</span><span class="p">]</span> <span class="n">printk</span><span class="o">+</span><span class="mh">0x84</span><span class="o">/</span><span class="mh">0x98</span>
<span class="p">[</span> <span class="mf">2082.828391</span><span class="p">]</span> <span class="p">[</span><span class="n">c000000f1633fb90</span><span class="p">]</span> <span class="p">[</span><span class="n">c0000000000851a8</span><span class="p">]</span> <span class="n">dump_work_fn</span><span class="o">+</span><span class="mh">0x238</span><span class="o">/</span><span class="mh">0x250</span>
<span class="p">[</span> <span class="mf">2082.828394</span><span class="p">]</span> <span class="p">[</span><span class="n">c000000f1633fc60</span><span class="p">]</span> <span class="p">[</span><span class="n">c0000000000ecb98</span><span class="p">]</span> <span class="n">process_one_work</span><span class="o">+</span><span class="mh">0x198</span><span class="o">/</span><span class="mh">0x4b0</span>
<span class="p">[</span> <span class="mf">2082.828396</span><span class="p">]</span> <span class="p">[</span><span class="n">c000000f1633fcf0</span><span class="p">]</span> <span class="p">[</span><span class="n">c0000000000ed3dc</span><span class="p">]</span> <span class="n">worker_thread</span><span class="o">+</span><span class="mh">0x18c</span><span class="o">/</span><span class="mh">0x5a0</span>
<span class="p">[</span> <span class="mf">2082.828399</span><span class="p">]</span> <span class="p">[</span><span class="n">c000000f1633fd80</span><span class="p">]</span> <span class="p">[</span><span class="n">c0000000000f4650</span><span class="p">]</span> <span class="n">kthread</span><span class="o">+</span><span class="mh">0x110</span><span class="o">/</span><span class="mh">0x130</span>
<span class="p">[</span> <span class="mf">2082.828403</span><span class="p">]</span> <span class="p">[</span><span class="n">c000000f1633fe30</span><span class="p">]</span> <span class="p">[</span><span class="n">c000000000009674</span><span class="p">]</span> <span class="n">ret_from_kernel_thread</span><span class="o">+</span><span class="mh">0x5c</span><span class="o">/</span><span class="mh">0x68</span>
</pre></div>
</div>
<p>Hence lets close SOL (and FW console) during FSP R/R.</p>
</li>
<li><p>FSP/CONSOLE: Do not associate unavailable console</p>
<p>Presently OPAL sends associate/unassociate MBOX command for all
FSP serial console (like below OPAL message). We have to check
console is available or not before sending this message.</p>
<p>OPAL log:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span> <span class="mf">5013.227994012</span><span class="p">,</span><span class="mi">7</span><span class="p">]</span> <span class="n">FSP</span><span class="p">:</span> <span class="n">Reassociating</span> <span class="n">HVSI</span> <span class="n">console</span> <span class="mi">1</span>
<span class="p">[</span> <span class="mf">5013.227997540</span><span class="p">,</span><span class="mi">7</span><span class="p">]</span> <span class="n">FSP</span><span class="p">:</span> <span class="n">Reassociating</span> <span class="n">HVSI</span> <span class="n">console</span> <span class="mi">2</span>
</pre></div>
</div>
</li>
<li><p>FSP: Disable PSI link whenever FSP tells OPAL about impending Reset/Reload</p>
<p>Commit 42d5d047 fixed scenario where DPO has been initiated, but FSP went
into reset before the CEC power down came in. But this is generic issue
that can happen in normal shutdown path as well.</p>
<p>Hence disable PSI link as soon as we detect FSP impending R/R.</p>
</li>
<li><p>fsp: return OPAL_BUSY_EVENT on failure sending FSP_CMD_POWERDOWN_NORM
Also, return OPAL_BUSY_EVENT on failure sending FSP_CMD_REBOOT / DEEP_REBOOT.</p>
<p>We had a race condition between FSP Reset/Reload and powering down
the system from the host:</p>
<p>Roughly:</p>
<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>#</p></th>
<th class="head"><p>FSP</p></th>
<th class="head"><p>Host</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>1</p></td>
<td><p>Power on</p></td>
<td></td>
</tr>
<tr class="row-odd"><td><p>2</p></td>
<td></td>
<td><p>Power on</p></td>
</tr>
<tr class="row-even"><td><p>3</p></td>
<td><p>(inject EPOW)</p></td>
<td></td>
</tr>
<tr class="row-odd"><td><p>4</p></td>
<td><p>(trigger FSP R/R)</p></td>
<td></td>
</tr>
<tr class="row-even"><td><p>5</p></td>
<td></td>
<td><p>Processes EPOW event, starts shutting down</p></td>
</tr>
<tr class="row-odd"><td><p>6</p></td>
<td></td>
<td><p>calls OPAL_CEC_POWER_DOWN</p></td>
</tr>
<tr class="row-even"><td><p>7</p></td>
<td><p>(is still in R/R)</p></td>
<td></td>
</tr>
<tr class="row-odd"><td><p>8</p></td>
<td></td>
<td><p>gets OPAL_INTERNAL_ERROR, spins in opal_poll_events</p></td>
</tr>
<tr class="row-even"><td><p>9</p></td>
<td><p>(FSP comes back)</p></td>
<td></td>
</tr>
<tr class="row-odd"><td><p>10</p></td>
<td></td>
<td><p>spinning in opal_poll_events</p></td>
</tr>
<tr class="row-even"><td><p>11</p></td>
<td><p>(thinks host is running)</p></td>
<td></td>
</tr>
</tbody>
</table>
<p>The call to OPAL_CEC_POWER_DOWN is only made once as the reset/reload
error path for fsp_sync_msg() is to return -1, which means we give
the OS OPAL_INTERNAL_ERROR, which is fine, except that our own API
docs give us the opportunity to return OPAL_BUSY when trying again
later may be successful, and we’re ambiguous as to if you should retry
on OPAL_INTERNAL_ERROR.</p>
<p>For reference, the linux code looks like this:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">static</span> <span class="n">void</span> <span class="n">__noreturn</span> <span class="n">pnv_power_off</span><span class="p">(</span><span class="n">void</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">long</span> <span class="n">rc</span> <span class="o">=</span> <span class="n">OPAL_BUSY</span><span class="p">;</span>
<span class="n">pnv_prepare_going_down</span><span class="p">();</span>
<span class="k">while</span> <span class="p">(</span><span class="n">rc</span> <span class="o">==</span> <span class="n">OPAL_BUSY</span> <span class="o">||</span> <span class="n">rc</span> <span class="o">==</span> <span class="n">OPAL_BUSY_EVENT</span><span class="p">)</span> <span class="p">{</span>
<span class="n">rc</span> <span class="o">=</span> <span class="n">opal_cec_power_down</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">rc</span> <span class="o">==</span> <span class="n">OPAL_BUSY_EVENT</span><span class="p">)</span>
<span class="n">opal_poll_events</span><span class="p">(</span><span class="n">NULL</span><span class="p">);</span>
<span class="k">else</span>
<span class="n">mdelay</span><span class="p">(</span><span class="mi">10</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">for</span> <span class="p">(;;)</span>
<span class="n">opal_poll_events</span><span class="p">(</span><span class="n">NULL</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
</div>
<p>Which means that <em>practically</em> our only option is to return OPAL_BUSY
or OPAL_BUSY_EVENT.</p>
<p>We choose OPAL_BUSY_EVENT for FSP systems as we do want to ensure we’re
running pollers to communicate with the FSP and do the final bits of
Reset/Reload handling before we power off the system.</p>
</li>
</ul>
</section>
<div class="clearer"></div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper">
<div>
<h4>Previous topic</h4>
<p class="topless"><a href="skiboot-5.4.7.html"
title="previous chapter">skiboot-5.4.7</a></p>
</div>
<div>
<h4>Next topic</h4>
<p class="topless"><a href="skiboot-5.4.9.html"
title="next chapter">skiboot-5.4.9</a></p>
</div>
<div role="note" aria-label="source link">
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="../_sources/release-notes/skiboot-5.4.8.rst.txt"
rel="nofollow">Show Source</a></li>
</ul>
</div>
<div id="searchbox" style="display: none" role="search">
<h3 id="searchlabel">Quick search</h3>
<div class="searchformwrapper">
<form class="search" action="../search.html" method="get">
<input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/>
<input type="submit" value="Go" />
</form>
</div>
</div>
<script>document.getElementById('searchbox').style.display = "block"</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related" role="navigation" aria-label="related navigation">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="../genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="skiboot-5.4.9.html" title="skiboot-5.4.9"
>next</a> |</li>
<li class="right" >
<a href="skiboot-5.4.7.html" title="skiboot-5.4.7"
>previous</a> |</li>
<li class="nav-item nav-item-0"><a href="../index.html">skiboot d365a01
documentation</a> &#187;</li>
<li class="nav-item nav-item-1"><a href="index.html" >Release Notes</a> &#187;</li>
<li class="nav-item nav-item-this"><a href="">skiboot-5.4.8</a></li>
</ul>
</div>
<div class="footer" role="contentinfo">
&#169; Copyright 2016-2017, IBM, others.
Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 7.2.6.
</div>
</body>
</html>