| <!DOCTYPE html> |
| |
| <html lang="en" data-content_root="../"> |
| <head> |
| <meta charset="utf-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" /> |
| |
| <title>skiboot-6.1 — skiboot d365a01 |
| documentation</title> |
| <link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=fa44fd50" /> |
| <link rel="stylesheet" type="text/css" href="../_static/classic.css?v=514cf933" /> |
| |
| <script src="../_static/documentation_options.js?v=e1fecbe9"></script> |
| <script src="../_static/doctools.js?v=888ff710"></script> |
| <script src="../_static/sphinx_highlight.js?v=dc90522c"></script> |
| |
| <link rel="index" title="Index" href="../genindex.html" /> |
| <link rel="search" title="Search" href="../search.html" /> |
| <link rel="next" title="skiboot-6.1-rc1" href="skiboot-6.1-rc1.html" /> |
| <link rel="prev" title="skiboot-6.0.9" href="skiboot-6.0.9.html" /> |
| </head><body> |
| <div class="related" role="navigation" aria-label="related navigation"> |
| <h3>Navigation</h3> |
| <ul> |
| <li class="right" style="margin-right: 10px"> |
| <a href="../genindex.html" title="General Index" |
| accesskey="I">index</a></li> |
| <li class="right" > |
| <a href="skiboot-6.1-rc1.html" title="skiboot-6.1-rc1" |
| accesskey="N">next</a> |</li> |
| <li class="right" > |
| <a href="skiboot-6.0.9.html" title="skiboot-6.0.9" |
| accesskey="P">previous</a> |</li> |
| <li class="nav-item nav-item-0"><a href="../index.html">skiboot d365a01 |
| documentation</a> »</li> |
| <li class="nav-item nav-item-1"><a href="index.html" accesskey="U">Release Notes</a> »</li> |
| <li class="nav-item nav-item-this"><a href="">skiboot-6.1</a></li> |
| </ul> |
| </div> |
| |
| <div class="document"> |
| <div class="documentwrapper"> |
| <div class="bodywrapper"> |
| <div class="body" role="main"> |
| |
| <section id="skiboot-6-1"> |
| <span id="id1"></span><h1>skiboot-6.1<a class="headerlink" href="#skiboot-6-1" title="Link to this heading">¶</a></h1> |
| <p>skiboot v6.1 was released on Wednesday July 11th 2018. It is the first |
| release of skiboot 6.1, which is the new stable release of skiboot |
| following the 6.0 release, first released May 11th 2018.</p> |
| <p>Skiboot 6.1 is the basis for op-build v2.1 and contains all bug fixes as |
| of <a class="reference internal" href="skiboot-6.0.5.html#skiboot-6-0-5"><span class="std std-ref">skiboot-6.0.5</span></a>, and <a class="reference internal" href="skiboot-5.4.9.html#skiboot-5-4-9"><span class="std std-ref">skiboot-5.4.9</span></a> (the currently maintained |
| stable releases). We expect further stable releases in the 6.0.x and 5.4.x |
| series, while we do not expect to do any stable releases of 6.1.x.</p> |
| <p>This final 6.1 release follows a single release candidate release, as this |
| cycle we have been rather quiet, with mainly cleanup and bug fix patches |
| going in.</p> |
| <p>For how the skiboot stable releases work, see <a class="reference internal" href="../process/stable-skiboot-rules.html#stable-rules"><span class="std std-ref">Skiboot stable tree rules and releases</span></a> for details.</p> |
| <p>Over skiboot-6.0, we have the following changes:</p> |
| <section id="general-changes-and-bug-fixes"> |
| <h2>General changes and bug fixes<a class="headerlink" href="#general-changes-and-bug-fixes" title="Link to this heading">¶</a></h2> |
| <p>Since <a class="reference internal" href="skiboot-6.1-rc1.html#skiboot-6-1-rc1"><span class="std std-ref">skiboot-6.1-rc1</span></a>:</p> |
| <ul> |
| <li><p>slw: Fix trivial typo in debug message</p></li> |
| <li><p>vpd: Add vendor property to processor node</p> |
| <p>Processor FRU vpd doesn’t contain vendor detail. We have to parse |
| module VPD to get vendor detail.</p> |
| </li> |
| <li><p>vpd: Sanitize VPD data</p> |
| <p>On OpenPower system, VPD keyword size tells us the maximum size of the data. |
| But they fill trailing end with space (0x20) instead of NULL. Also spec |
| doesn’t stop user to have space (0x20) within actual data.</p> |
| <p>This patch discards trailing spaces before populating device tree.</p> |
| </li> |
| <li><p>core: always flush console before stopping</p> |
| <p>This catches a few cases (e.g., fast reboot failure messages) that |
| don’t always make it to the console before the machine is rebooted.</p> |
| </li> |
| <li><p>core/cpu: parallelise global CPU register setting jobs</p> |
| <p>On a 176 thread system, before:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span> <span class="mf">122.319923233</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span> <span class="n">OPAL</span><span class="p">:</span> <span class="n">Switch</span> <span class="n">to</span> <span class="n">big</span><span class="o">-</span><span class="n">endian</span> <span class="n">OS</span> |
| <span class="p">[</span> <span class="mf">126.317897467</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span> <span class="n">OPAL</span><span class="p">:</span> <span class="n">Switch</span> <span class="n">to</span> <span class="n">little</span><span class="o">-</span><span class="n">endian</span> <span class="n">OS</span> |
| </pre></div> |
| </div> |
| <p>after:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span> <span class="mf">212.439299889</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span> <span class="n">OPAL</span><span class="p">:</span> <span class="n">Switch</span> <span class="n">to</span> <span class="n">big</span><span class="o">-</span><span class="n">endian</span> <span class="n">OS</span> |
| <span class="p">[</span> <span class="mf">212.469323643</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span> <span class="n">OPAL</span><span class="p">:</span> <span class="n">Switch</span> <span class="n">to</span> <span class="n">little</span><span class="o">-</span><span class="n">endian</span> <span class="n">OS</span> |
| </pre></div> |
| </div> |
| </li> |
| <li><p>init, occ: Initialise OCC earlier on BMC systems</p> |
| <p>We need to use the OCC to obtain presence data for the SXM2 slots on |
| Witherspoon systems. This is needed to determine device type for NVLink |
| GPUs and OpenCAPI devices which can be plugged into the same slot. Support |
| for this will be implemented in a future patch.</p> |
| <p>Currently, OCC initialisation is done just before handing over to Linux, |
| which is well after NPU probe. On FSP systems, OCC boot starts very late, |
| so we wait until the last possible moment to initialise the skiboot side in |
| order to give it the maximum time to boot. On BMC systems, OCC boot starts |
| earlier, so there aren’t any issues in moving it earlier in the skiboot |
| init sequence.</p> |
| <p>When running on a BMC machine, call occ_pstates_init() as early as |
| possible in the init sequence. On FSP machines, continue to call it from |
| its current location.</p> |
| </li> |
| </ul> |
| <p>Since <a class="reference internal" href="skiboot-6.0.html#skiboot-6-0"><span class="std std-ref">skiboot-6.0</span></a>:</p> |
| <ul> |
| <li><p>GCC8 build fixes</p></li> |
| <li><p>Add prepare_hbrt_update to hbrt interfaces</p> |
| <p>Add placeholder support for prepare_hbrt_update call into |
| hostboot runtime (opal-prd) code. This interface is only |
| called as part of a concurrent code update on a FSP based |
| system.</p> |
| </li> |
| <li><p>cpu: Clear PCR SPR in opal_reinit_cpus()</p> |
| <p>Currently if Linux boots with a non-zero PCR, things can go bad where |
| some early userspace programs can take illegal instructions. This is |
| being fixed in Linux, but in the mean time, we should cleanup in |
| skiboot also.</p> |
| </li> |
| <li><p>pci: Fix PCI_DEVICE_ID()</p> |
| <p>The vendor ID is 16 bits not 8. This error leaves the top of the vendor |
| ID in the bottom bits of the device ID, which resulted in e.g. a failure |
| to run the PCI quirk for the AST VGA device.</p> |
| </li> |
| <li><p>Quieten console output on boot</p> |
| <p>We print out a whole bunch of things on boot, most of which aren’t |
| interesting, so we should <em>not</em> print them instead.</p> |
| <p>Printing things like what CPUs we found and what PCI devices we found |
| <em>are</em> useful, so continue to do that. But we don’t need to splat out |
| a bunch of things that are always going to be true.</p> |
| </li> |
| <li><p>core/console: fix deadlock when printing with console lock held</p> |
| <p>Some debugging options will print while the console lock is held, |
| which is why the console lock is taken as a recursive lock. |
| However console_write calls __flush_console, which will drop and |
| re-take the lock non-recursively in some cases.</p> |
| <p>Just set con_need_flush and return from __flush_console if we are |
| holding the console lock already.</p> |
| <p>This stack usage message (taken with this patch applied) could lead |
| to a deadlock without this:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">CPU</span> <span class="mi">0000</span> <span class="n">lowest</span> <span class="n">stack</span> <span class="n">mark</span> <span class="mi">11768</span> <span class="nb">bytes</span> <span class="n">left</span> <span class="n">pc</span><span class="o">=</span><span class="mi">300</span><span class="n">cb808</span> <span class="n">token</span><span class="o">=</span><span class="mi">0</span> |
| <span class="n">CPU</span> <span class="mi">0000</span> <span class="n">Backtrace</span><span class="p">:</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c03370</span> <span class="n">R</span><span class="p">:</span> <span class="mi">00000000300</span><span class="n">cb808</span> <span class="o">.</span><span class="n">list_check_node</span><span class="o">+</span><span class="mh">0x1c</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c03410</span> <span class="n">R</span><span class="p">:</span> <span class="mi">00000000300</span><span class="n">cb910</span> <span class="o">.</span><span class="n">list_check</span><span class="o">+</span><span class="mh">0x38</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c034b0</span> <span class="n">R</span><span class="p">:</span> <span class="mi">00000000300190</span><span class="n">ac</span> <span class="o">.</span><span class="n">try_lock_caller</span><span class="o">+</span><span class="mh">0xb8</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c03540</span> <span class="n">R</span><span class="p">:</span> <span class="mf">00000000300192e0</span> <span class="o">.</span><span class="n">lock_caller</span><span class="o">+</span><span class="mh">0x80</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c03600</span> <span class="n">R</span><span class="p">:</span> <span class="mi">0000000030012</span><span class="n">c70</span> <span class="o">.</span><span class="n">__flush_console</span><span class="o">+</span><span class="mh">0x134</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c036d0</span> <span class="n">R</span><span class="p">:</span> <span class="mi">00000000300130</span><span class="n">cc</span> <span class="o">.</span><span class="n">console_write</span><span class="o">+</span><span class="mh">0x68</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c03780</span> <span class="n">R</span><span class="p">:</span> <span class="mi">00000000300347</span><span class="n">bc</span> <span class="o">.</span><span class="n">vprlog</span><span class="o">+</span><span class="mh">0xc8</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c03970</span> <span class="n">R</span><span class="p">:</span> <span class="mi">0000000030034844</span> <span class="o">.</span><span class="n">_prlog</span><span class="o">+</span><span class="mh">0x50</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c03a00</span> <span class="n">R</span><span class="p">:</span> <span class="mi">00000000300364</span><span class="n">a4</span> <span class="o">.</span><span class="n">log_simple_error</span><span class="o">+</span><span class="mh">0x74</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c03b90</span> <span class="n">R</span><span class="p">:</span> <span class="mi">000000003004</span><span class="n">ab48</span> <span class="o">.</span><span class="n">occ_pstates_init</span><span class="o">+</span><span class="mh">0x184</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c03d50</span> <span class="n">R</span><span class="p">:</span> <span class="mi">000000003001480</span><span class="n">c</span> <span class="o">.</span><span class="n">load_and_boot_kernel</span><span class="o">+</span><span class="mh">0x38c</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c03e30</span> <span class="n">R</span><span class="p">:</span> <span class="mi">000000003001571</span><span class="n">c</span> <span class="o">.</span><span class="n">main_cpu_entry</span><span class="o">+</span><span class="mh">0x62c</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c03f00</span> <span class="n">R</span><span class="p">:</span> <span class="mi">0000000030002700</span> <span class="n">boot_entry</span><span class="o">+</span><span class="mh">0x1c0</span> |
| </pre></div> |
| </div> |
| </li> |
| <li><p>opal-prd: Do not error out on first failure for soft/hard offline.</p> |
| <p>The memory errors (CEs and UEs) that are detected as part of background |
| memory scrubbing are reported by PRD asynchronously to opal-prd along with |
| affected memory ranges. hservice_memory_error() converts these ranges into |
| page granularity before hooking up them to soft/hard offline-ing |
| infrastructure.</p> |
| <p>But the current implementation of hservice_memory_error() does not hookup |
| all the pages to soft/hard offline-ing if any of the page offline action |
| fails. e.g hard offline can fail for:</p> |
| <ul class="simple"> |
| <li><p>Pages that are not part of buddy managed pool.</p></li> |
| <li><p>Pages that are reserved by kernel using memblock_reserved()</p></li> |
| <li><p>Pages that are in use by kernel.</p></li> |
| </ul> |
| <p>But for the pages that are in use by user space application, the hard |
| offline marks the page as hwpoison, sends SIGBUS signal to kill the |
| affected application as recovery action and returns success.</p> |
| <p>Hence, It is possible that some of the pages in that memory range are in |
| use by application or free. By stopping on first error we loose the |
| opportunity to hwpoison the subsequent pages which may be free or in use by |
| application. This patch fixes this issue.</p> |
| </li> |
| <li><p>libflash/blocklevel_write: Fix missing error handling</p> |
| <p>Caught by scan-build, we seem to trap the errors in rc, but |
| not take any recovery action during blocklevel_write.</p> |
| </li> |
| </ul> |
| <section id="i2c"> |
| <h3>I2C<a class="headerlink" href="#i2c" title="Link to this heading">¶</a></h3> |
| <ul> |
| <li><p>p8-i2c: fix wrong request status when a reset is needed</p> |
| <p>If the bus is found in error state when starting a new request, the |
| engine is reset and we enter recovery. However, once complete, the |
| reset operation shows a status of complete in the status register. So |
| any badly-timed called to check_status() will think the current top |
| request is complete, even though it hasn’t run yet.</p> |
| <p>So don’t update any request status while we are in recovery, as |
| nothing useful for the request is supposed to happen in that state.</p> |
| </li> |
| <li><p>p8-i2c: Remove force reset</p> |
| <p>Force reset was added as an attempt to work around some issues with TPM |
| devices locking up their I2C bus. In that particular case the problem |
| was that the device would hold the SCL line down permanently due to a |
| device firmware bug. The force reset doesn’t actually do anything to |
| alleviate the situation here, it just happens to reset the internal |
| master state enough to make the I2C driver appear to work until |
| something tries to access the bus again.</p> |
| <p>On P9 systems with secure boot enabled there is the added problem |
| of the “diagostic mode” not being supported on I2C masters A,B,C and |
| D. Diagnostic mode allows the SCL and SDA lines to be driven directly |
| by software. Without this force reset is impossible to implement.</p> |
| <p>This patch removes the force reset functionality entirely since:</p> |
| <ol class="loweralpha simple"> |
| <li><p>it doesn’t do what it’s supposed to, and</p></li> |
| <li><p>it’s butt ugly code</p></li> |
| </ol> |
| <p>Additionally, turn p8_i2c_reset_engine() into p8_i2c_reset_port(). |
| There’s no need to reset every port on a master in response to an |
| error that occurred on a specific port.</p> |
| </li> |
| <li><p>libstb/i2c-driver: Bump max timeout</p> |
| <p>We have observed some TPMs clock streching the I2C bus for signifigant |
| amounts of time when processing commands. The same TPMs also have |
| errata that can result in permernantly locking up a bus in response to |
| an I2C transaction they don’t understand. Using an excessively long |
| timeout to prevent this in the field.</p> |
| </li> |
| <li><p>hdata: Add TPM timeout workaround</p> |
| <p>Set the default timeout for any bus containing a TPM to one second. This |
| is needed to work around a bug in the firmware of certain TPMs that will |
| clock strech the I2C port the for up to a second. Additionally, when the |
| TPM is clock streching it responds to a STOP condition on the bus by |
| bricking itself. Clearing this error requires a hard power cycle of the |
| system since the TPM is powered by standby power.</p> |
| </li> |
| <li><p>p8-i2c: Allow a per-port default timeout</p> |
| <p>Add support for setting a default timeout for the I2C port to the |
| device-tree. This is consumed by skiboot.</p> |
| </li> |
| </ul> |
| </section> |
| <section id="ipmi-watchdog"> |
| <h3>IPMI Watchdog<a class="headerlink" href="#ipmi-watchdog" title="Link to this heading">¶</a></h3> |
| <ul> |
| <li><p>ipmi-watchdog: Support handling re-initialization</p> |
| <p>Watchdog resets can return an error code from the BMC indicating that |
| the BMC watchdog was not initialized. Currently we abort skiboot due to |
| a missing error handler. This patch implements handling |
| re-initialization for the watchdog, automatically saving the last |
| watchdog set values and re-issuing them if needed.</p> |
| </li> |
| <li><p>ipmi-watchdog: The stop action should disable reset</p> |
| <p>Otherwise it is possible for the reset timer to elapse and trigger the |
| watchdog to wake back up. This doesn’t affect the behavior of the |
| system since we are providing a NONE action to the BMC. However we would |
| like to avoid the action from taking place if possible.</p> |
| </li> |
| <li><p>ipmi-watchdog: Add a flag to determine if we are still ticking</p> |
| <p>This makes it easier for future changes to ensure that the watchdog |
| stops ticking and doesn’t requeue itself for execution in the |
| background. This way it is safe for resets to be performed after the |
| ticks are assumed to be stopped and it won’t start the timer again.</p> |
| </li> |
| <li><p>ipmi-watchdog: (prepare for) not disabling at shutdown</p> |
| <p>The op-build linux kernel has been configured to support the ipmi |
| watchdog. This driver will always handle the watchdog by either leaving |
| it enabled if configured, or by disabling it during module load if no |
| configuration is provided. This increases the coverage of the watchdog |
| during the boot process. The watchdog should no longer be disabled at |
| any point during skiboot execution.</p> |
| <p>We’re not enabling this by default yet as people can (and do, at least in |
| development) mix and match old BOOTKERNEL with new skiboot and we don’t |
| want to break that too obviously.</p> |
| </li> |
| <li><p>ipmi-watchdog: Don’t reset the watchdog twice</p> |
| <p>There is no clarification for why this change was needed, but presumably |
| this is due to a buggy BMC implementation where the Watchdog Set command |
| was processed concurrently or after the initial Watchdog Reset. This |
| inversion would cause the watchdog to stop since the DONT_STOP bit was |
| not set. Since we are now using the DONT_STOP bit during initialization, |
| the watchdog should not be stopped even if an inversion occurs.</p> |
| </li> |
| <li><p>ipmi-watchdog: Make it possible to set DONT_STOP</p> |
| <p>The IPMI standard supports setting a DONT_STOP bit during an Watchdog |
| Set operation. Most of the time we don’t want to stop the Watchdog when |
| updating the settings so we should be using this bit. This patch makes |
| it possible for callers of set_wdt to prevent the watchdog from being |
| stopped. This only changes the behavior of the watchdog during the |
| initial settings update when initializing skiboot. The watchdog is no |
| longer disabled and then immediately re-enabled.</p> |
| </li> |
| <li><p>ipmi-watchdog: WD_POWER_CYCLE_ACTION -> WD_RESET_ACTION</p> |
| <p>The IPMI specification denotes that action 0x1 is Host Reset and 0x3 is |
| Host Power Cycle. Use the correct name for Reset in our watchdog code.</p> |
| </li> |
| </ul> |
| </section> |
| </section> |
| <section id="power8-platforms"> |
| <h2>POWER8 platforms<a class="headerlink" href="#power8-platforms" title="Link to this heading">¶</a></h2> |
| <ul> |
| <li><p>astbmc: Enable mbox depending on scratch reg</p> |
| <p>P8 boxes can opt in for mbox pnor support if they set the scratch |
| register bit to indicate it is supported.</p> |
| </li> |
| </ul> |
| </section> |
| <section id="simulator-platforms"> |
| <h2>Simulator platforms<a class="headerlink" href="#simulator-platforms" title="Link to this heading">¶</a></h2> |
| <p>Since <a class="reference internal" href="skiboot-6.1-rc1.html#skiboot-6-1-rc1"><span class="std std-ref">skiboot-6.1-rc1</span></a>:</p> |
| <ul> |
| <li><p>pmem: volatile bindings for the poorly enabled</p> |
| <p>PMEM_DISK bindings were added, but they rely on a rather |
| recent mmap feature. This patch steals from those bindings |
| to add volatile bindings. I’ve used these bindings with |
| PMEM_VOLATILE to launch an instance with the publicly |
| available systemsim-p9. The bindings are volatile and one |
| should not expect any data to be saved/retrieved.</p> |
| </li> |
| </ul> |
| <p>Since <a class="reference internal" href="skiboot-6.0.html#skiboot-6-0"><span class="std std-ref">skiboot-6.0</span></a>:</p> |
| <ul> |
| <li><p>plat/qemu: add PNOR support</p> |
| <p>To access the PNOR, OPAL/skiboot drives the BMC SPI controller using |
| the iLPC2AHB device of the BMC SuperIO controller and accesses the |
| flash contents using the LPC FW address space on which the PNOR is |
| remapped.</p> |
| <p>The QEMU PowerNV machine now integrates such models (SuperIO |
| controller, iLPC2AHB device) and also a pseudo Aspeed SoC AHB memory |
| space populated with the SPI controller registers (same model as for |
| ARM). The AHB window giving access to the contents of the BMC SPI |
| controller flash modules is mapped on the LPC FW address space.</p> |
| <p>The change should be compatible for machine without PNOR support.</p> |
| </li> |
| <li><p>external/mambo: Add support for readline if it exists</p> |
| <p>Add support for tclreadline package if it is present. |
| This patch loads the package and uses it when the |
| simulation stops for any reason.</p> |
| </li> |
| </ul> |
| </section> |
| <section id="fsp-based-platforms"> |
| <h2>FSP based platforms<a class="headerlink" href="#fsp-based-platforms" title="Link to this heading">¶</a></h2> |
| <ul> |
| <li><p>Disable fast reboot on FSP IPL side change</p> |
| <p>If FSP changes next IPL side, then disable fast reboot.</p> |
| <p>sample output:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span> <span class="mf">620.196442259</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span> <span class="n">FSP</span><span class="p">:</span> <span class="n">Got</span> <span class="n">sysparam</span> <span class="n">update</span><span class="p">,</span> <span class="n">param</span> <span class="n">ID</span> <span class="mh">0xf0000007</span> |
| <span class="p">[</span> <span class="mf">620.196444501</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span> <span class="n">CUPD</span><span class="p">:</span> <span class="n">FW</span> <span class="n">IPL</span> <span class="n">side</span> <span class="n">changed</span><span class="o">.</span> <span class="n">Disable</span> <span class="n">fast</span> <span class="n">reboot</span> |
| <span class="p">[</span> <span class="mf">620.196445389</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span> <span class="n">CUPD</span><span class="p">:</span> <span class="n">Next</span> <span class="n">IPL</span> <span class="n">side</span> <span class="p">:</span> <span class="n">perm</span> |
| </pre></div> |
| </div> |
| </li> |
| <li><p>fsp/console: Always establish OPAL console API backend</p> |
| <p>Currently we only call set_opal_console() to establish the backend |
| used by the OPAL console API if we find at least one FSP serial |
| port in HDAT.</p> |
| <p>On systems where there is none (IPMI only), we fail to set it, |
| causing the console code to try to use the dummy console causing |
| an assertion failure during boot due to clashing on the device-tree |
| node names.</p> |
| <p>So always set it if an FSP is present</p> |
| </li> |
| </ul> |
| </section> |
| <section id="ast-bmc-based-platforms"> |
| <h2>AST BMC based platforms<a class="headerlink" href="#ast-bmc-based-platforms" title="Link to this heading">¶</a></h2> |
| <ul> |
| <li><p>AMI BMC: use 0x3a as OEM command</p> |
| <p>The 0x3a OEM command is for IBM commands, while 0x32 was for AMI ones. |
| Sometime in the P8 timeframe, AMI BMCs were changed to listen for our |
| commands on either 0x32 or 0x3a. Since 0x3a is the direction forward, |
| we’ll use that, as P9 machines with AMI BMCs probably also want these |
| to work, and let’s not bet that 0x32 will continue to be okay.</p> |
| </li> |
| <li><p>astbmc: Set romulus BMC type to OpenBMC</p></li> |
| <li><p>platform/astbmc: Do not delete compatible property</p> |
| <p>P9 onwards OPAL is building device tree for BMC based system using |
| HDAT. We are populating bmc/compatible node with bmc version. Hence |
| do not delete this property.</p> |
| </li> |
| </ul> |
| </section> |
| <section id="utilities"> |
| <h2>Utilities<a class="headerlink" href="#utilities" title="Link to this heading">¶</a></h2> |
| <ul> |
| <li><p>external/xscom-utils: Add python library for xscom access</p> |
| <p>Patch adds a simple python library module for xscom access. |
| It directly manipulate the ‘/access’ file for scom read |
| and write from debugfs ‘scom’ directory.</p> |
| <p>Example on how to generate a getscom using this module:</p> |
| <div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">adu_scoms</span> <span class="kn">import</span> <span class="o">*</span> |
| <span class="n">getscom</span> <span class="o">=</span> <span class="n">GetSCom</span><span class="p">()</span> |
| <span class="n">getscom</span><span class="o">.</span><span class="n">parse_args</span><span class="p">()</span> |
| <span class="n">getscom</span><span class="o">.</span><span class="n">run_command</span><span class="p">()</span> |
| </pre></div> |
| </div> |
| <p>Sample output for above getscom.py:</p> |
| <div class="highlight-console notranslate"><div class="highlight"><pre><span></span><span class="gp"># </span>./getscom.py<span class="w"> </span>-l |
| <span class="go">Chip ID | Rev | Chip type</span> |
| <span class="go">---------|-------|-----------</span> |
| <span class="go">00000008 | DD2.0 | P9 (Nimbus) processor</span> |
| <span class="go">00000000 | DD2.0 | P9 (Nimbus) processor</span> |
| </pre></div> |
| </div> |
| </li> |
| <li><p>ffspart: Don’t require user to create blank partitions manually</p> |
| <p>Add ‘–allow-empty’ which allows the filename for a given partition to |
| be blank. If set ffspart will set that part of the PNOR file ‘blank’ and |
| set ECC bits if required. |
| Without this option behaviour is unchanged and ffspart will return an |
| error if it can not find the partition file.</p> |
| </li> |
| <li><p>pflash: Use correct prefix when installing</p> |
| <p>pflash uses lowercase prefix when running make install in it’s |
| direcetory, but uppercase PREFIX when running it in shared. Use |
| lowercase everywhere.</p> |
| <p>With this the OpenBMC bitbake recipie can drop an out of tree patch it’s |
| been carrying for years.</p> |
| </li> |
| </ul> |
| </section> |
| <section id="power9"> |
| <h2>POWER9<a class="headerlink" href="#power9" title="Link to this heading">¶</a></h2> |
| <p>Since <a class="reference internal" href="skiboot-6.1-rc1.html#skiboot-6-1-rc1"><span class="std std-ref">skiboot-6.1-rc1</span></a>:</p> |
| <ul> |
| <li><p>occ: sensors: Fix the size of the phandle array ‘sensors’ in DT</p> |
| <p>Fixes: 99505c03f493 (present in v5.10-rc4)</p> |
| </li> |
| <li><p>phb4: Delay training till after PERST is deasserted</p> |
| <p>This helps some cards train on the second PERST (ie fast-reboot). The |
| reason is not clear why but it helps, so YOLO!</p> |
| </li> |
| </ul> |
| <p>Since <a class="reference internal" href="skiboot-6.0.html#skiboot-6-0"><span class="std std-ref">skiboot-6.0</span></a>:</p> |
| <ul> |
| <li><p>occ-sensor: Avoid using uninitialised struct cpu_thread</p> |
| <p>When adding the sensors in occ_sensors_init, if the type is not |
| OCC_SENSOR_LOC_CORE, then the loop to find ‘c’ will not be executed. |
| Then c->pir is used for both of the the add_sensor_node calls below.</p> |
| <p>This provides a default value of 0 instead.</p> |
| </li> |
| <li><p>NX: Add NX coprocessor init opal call</p> |
| <p>The read offset (4:11) in Receive FIFO control register is incremented |
| by FIFO size whenever CRB read by NX. But the index in RxFIFO has to |
| match with the corresponding entry in FIFO maintained by VAS in kernel. |
| VAS entry is reset to 0 when opening the receive window during driver |
| initialization. So when NX842 is reloaded or in kexec boot, possibility |
| of mismatch between RxFIFO control register and VAS entries in kernel. |
| It could cause CRB failure / timeout from NX.</p> |
| <p>This patch adds nx_coproc_init opal call for kernel to initialize |
| readOffset (4:11) and Queued (15:23) in RxFIFO control register.</p> |
| </li> |
| <li><p>SLW: Remove stop1_lite and stop2_lite</p> |
| <p>stop1_lite has been removed since it adds no additional benefit |
| over stop0_lite. stop2_lite has been removed since currently it adds |
| minimal benefit over stop2. However, the benefit is eclipsed by the time |
| required to ungate the clocks</p> |
| <p>Moreover, Lite states don’t give up the SMT resources, can potentially |
| have a performance impact on sibling threads.</p> |
| <p>Since current OSs (Linux) aren’t smart enough to make good decisions |
| with these stop states, we’re (temporarly) removing them from what |
| we expose to the OS, the idea being to bring them back in a new |
| DT representation so that only an OS that knows what to do will |
| do things with them.</p> |
| </li> |
| <li><p>cpu: Use STOP1 on POWER9 for idle/sleep inside OPAL</p> |
| <p>The current code requests STOP3, which means it gets STOP2 in practice.</p> |
| <p>STOP2 has proven to occasionally be unreliable depending on FW |
| version and chip revision, it also requires a functional CME, |
| so instead, let’s use STOP1. The difference is rather minimum |
| for something that is only used a few seconds during boot.</p> |
| </li> |
| </ul> |
| <section id="npu2-nvlink2-and-opencapi"> |
| <h3>NPU2 (NVLink2 and OpenCAPI)<a class="headerlink" href="#npu2-nvlink2-and-opencapi" title="Link to this heading">¶</a></h3> |
| <p>Since <a class="reference internal" href="skiboot-6.1-rc1.html#skiboot-6-1-rc1"><span class="std std-ref">skiboot-6.1-rc1</span></a>:</p> |
| <ul> |
| <li><p>capi: Select the correct IODA table entry for the mbt cache.</p> |
| <p>With the current code, the capi mmio window is not correctly configured |
| in the IODA table entry. The first entry (generally the non-prefetchable |
| BAR) is overwrriten. |
| This patch sets the capi window bar at the right place.</p> |
| </li> |
| <li><p>npu2/hw-procedures: Fence bricks via NTL instead of MISC</p> |
| <p>There are a couple of places we can set/unset fence for a brick:</p> |
| <ol class="arabic simple"> |
| <li><p>MISC register: NPU2_MISC_FENCE_STATE</p></li> |
| <li><p>NTL register for the brick: NPU2_NTL_MISC_CFG1(ndev)</p></li> |
| </ol> |
| <p>Recent testing of ATS in combination with GPU reset has exposed a side |
| effect of using (1); if fence is set for all six bricks, it triggers a |
| sticky nmmu latch which prevents the NPU from getting ATR responses. |
| This manifests as a hang in the tests.</p> |
| <p>We have npu2_dev_fence_brick() which uses (1), and only two calls to it. |
| Replace the call which sets fence with a write to (2). Remove the |
| corresponding unset call entirely. It’s unneeded because the procedures |
| already do a progression from full fence to half to idle using (2).</p> |
| </li> |
| <li><p>phb4/capp: Calculate STQ/DMA read engines based on link-width for PEC</p> |
| <p>Presently in CAPI mode the number of STQ/DMA-read engines allocated on |
| PEC2 for CAPP is fixed to 6 and 0-30 respectively irrespective of the |
| PCI link width. These values are only suitable for x8 cards and |
| quickly run out if a x16 card is plugged to a PEC2 attached slot. This |
| usually manifests as CAPP reporting TLBI timeout due to these messages |
| getting stalled due to insufficient STQs.</p> |
| <p>To fix this we update enable_capi_mode() to check if PEC2 chiplet is |
| in x16 mode and if yes then we allocate 4/0-47 STQ/DMA-read engines |
| for the CAPP traffic.</p> |
| <p>Fixes: 37ea3cfdc852 (present in v5.7-rc1)</p> |
| </li> |
| <li><p>npu2: Use same compatible string for NVLink and OpenCAPI link nodes in device tree</p> |
| <p>Currently, we distinguish between NPU links for NVLink devices and OpenCAPI |
| devices through the use of two different compatible strings - ibm,npu-link |
| and ibm,npu-link-opencapi.</p> |
| <p>As we move towards supporting configurations with both NVLink and OpenCAPI |
| devices behind a single NPU, we need to detect the device type as part of |
| presence detection, which can’t happen until well after the point where the |
| HDAT or platform code has created the NPU device tree nodes. Changing a |
| node’s compatible string after it’s been created is a bit ugly, so instead |
| we should move the device type to a new property which we can add to the |
| node later on.</p> |
| <p>Get rid of the ibm,npu-link-opencapi compatible string, add a new |
| ibm,npu-link-type property, and a helper function to check the link type. |
| Add an “unknown” device type in preparation for later patches to detect |
| device type dynamically.</p> |
| <p>These device tree bindings are entirely internal to skiboot and are not |
| consumed directly by Linux, so this shouldn’t break anything (other than |
| internal BML lab environments).</p> |
| </li> |
| <li><p>occ: Add support for GPU presence detection</p> |
| <p>On the Witherspoon platform, we need to distinguish between NVLink GPUs and |
| OpenCAPI accelerators. In order to do this, we first need to find out |
| whether the SXM2 socket is populated.</p> |
| <p>On Witherspoon, the SXM2 socket’s presence detection pin is only visible |
| via I2C from the APSS, and thus can only be exposed to the host via the |
| OCC. The OCC, per OCC Firmware Interface Specification for POWER9 version |
| 0.22, now exposes this to skiboot through a field in the dynamic data |
| shared memory.</p> |
| <p>Add the necessary dynamic data changes required to read the version and |
| GPU presence fields. Add a function, occ_get_gpu_presence(), that can be |
| used to check GPU presence.</p> |
| <p>If the OCC isn’t reporting presence (old OCC firmware, or some other |
| reason), we default to assuming there is a device present and wait until |
| link training to fail.</p> |
| <p>This will be used in later patches to fix up the NPU2 probe path for |
| OpenCAPI support on Witherspoon.</p> |
| </li> |
| <li><p>hw/npu2, core/hmi: Use NPU instead of NPU2 as log message prefix</p> |
| <p>The NPU2{DBG,INF,ERR} macros use “NPU%d” as a prefix to identify messages |
| relating to a particular NPU.</p> |
| <p>It’s slightly confusing to have per-NPU messages prefixed with “NPU0” or |
| “NPU1” and NPU-generic messages prefixed with “NPU2”. On some future system |
| we could potentially have a NPU #2 in which case it’d be really confusing.</p> |
| <p>Use NPU rather than NPU2 for NPU-generic log messages. There’s no risk of |
| confusion with the original npu.c code since that’s only for P8.</p> |
| </li> |
| </ul> |
| <p>Since <a class="reference internal" href="skiboot-6.0.html#skiboot-6-0"><span class="std std-ref">skiboot-6.0</span></a>:</p> |
| <ul> |
| <li><p>npu2: Reset NVLinks on hot reset</p> |
| <p>This effectively fences GPU RAM on GPU reset so the host system |
| does not have to crash every time we stop a KVM guest with a GPU |
| passed through.</p> |
| </li> |
| <li><p>npu2-opencapi: reduce number of retries to train the link</p> |
| <p>We’ve been reliably training the opencapi link on the first attempt |
| for quite a while. Furthermore, if it doesn’t train on the first |
| attempt, retries haven’t been that useful. So let’s reduce the number |
| of attempts we do to train the link.</p> |
| <p>2 retries = 3 attempts to train.</p> |
| <p>Each (failed) training sequence costs about 3 seconds.</p> |
| </li> |
| <li><p>opal/hmi: Display correct chip id while printing NPU FIRs.</p> |
| <p>HMIs for NPU xstops are broadcasted to all chips. All cores on all the |
| chips receive HMI. HMI handler correctly identifies and extracts the |
| NPU FIR details from affected chip, but while printing FIR data it |
| prints chip id and location code details of this_cpu()->chip_id which |
| may not be correct. This patch fixes this issue.</p> |
| </li> |
| <li><p>npu2-opencapi: Fix link state to report link down</p> |
| <p>The PHB callback ‘get_link_state’ is always reporting the link width, |
| irrespective of the link status and even when the link is down. It is |
| causing too much work (and failures) when the PHB is probed during pci |
| init. |
| The fix is to look at the link status first and report the link as |
| down when appropriate.</p> |
| </li> |
| <li><p>npu2-opencapi: Cleanup traces printed during link training</p> |
| <p>Now that links may train in parallel, traces shown during training can |
| be all mixed up. So add a prefix to all the traces to clearly identify |
| the chip and link the trace refers to:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">OCAPI</span><span class="p">[</span><span class="o"><</span><span class="n">chip</span> <span class="nb">id</span><span class="o">></span><span class="p">:</span><span class="o"><</span><span class="n">link</span> <span class="nb">id</span><span class="o">></span><span class="p">]:</span> <span class="n">this</span> <span class="ow">is</span> <span class="n">a</span> <span class="n">very</span> <span class="n">useful</span> <span class="n">message</span> |
| </pre></div> |
| </div> |
| <p>The lower-level hardware procedures (npu2-hw-procedures.c) also print |
| traces which would need work. But that code is being reworked to be |
| better integrated with opencapi and nvidia, so leave it alone for now.</p> |
| </li> |
| <li><p>npu2-opencapi: Train links on fundamental reset</p> |
| <p>Reorder our link training steps so that they are executed on |
| fundamental reset instead of during the initial setup. Skiboot always |
| call a fundamental reset on all the PHBs during pci init.</p> |
| <p>It is done through a state machine, similarly to what is done for |
| ‘real’ PHBs.</p> |
| <p>This is the first step for a longer term goal to be able to trigger an |
| adapter reset from linux. We’ll need the reset callbacks of the PHB to |
| be defined. We have to handle the various delays differently, since a |
| linux thread shouldn’t stay stuck waiting in opal for too long.</p> |
| </li> |
| <li><p>npu2-opencapi: Rework adapter reset</p> |
| <p>Rework a bit the code to reset the opencapi adapter:</p> |
| <ul class="simple"> |
| <li><p>make clearer which i2c pin is resetting which device</p></li> |
| <li><p>break the reset operation in smaller chunks. This is really to |
| prepare for a future patch.</p></li> |
| </ul> |
| <p>No functional changes.</p> |
| </li> |
| <li><p>npu2-opencapi: Use presence detection</p> |
| <p>Presence detection is not part of the opencapi specification. So each |
| platform may choose to implement it the way it wants.</p> |
| <p>All current platforms implement it through an i2c device where we can |
| query a pin to know if a device is connected or not. ZZ and Zaius have |
| a similar design and even use the same i2c information and pin |
| numbers. |
| However, presence detection on older ZZ planar (older than v4) doesn’t |
| work, so we don’t activate it for now, until our lab systems are |
| upgraded and it’s better tested.</p> |
| <p>Presence detection on witherspoon is still being worked on. It’s |
| shaping up to be quite different, so we may have to revisit the topic |
| in a later patch.</p> |
| </li> |
| </ul> |
| </section> |
| </section> |
| <section id="testing-and-ci"> |
| <h2>Testing and CI<a class="headerlink" href="#testing-and-ci" title="Link to this heading">¶</a></h2> |
| <p>Since <a class="reference internal" href="skiboot-6.1-rc1.html#skiboot-6-1-rc1"><span class="std std-ref">skiboot-6.1-rc1</span></a>:</p> |
| <ul> |
| <li><p>test/qemu: start building qemu again, and use our built qemu for tests</p> |
| <p>We need to use QEMU_BIN rather than QEMU as the makefiles define |
| QEMU already.</p> |
| </li> |
| <li><p>opal-ci: qemu: Use the powernv-3.0 branch</p> |
| <p>This is based off the current development version of Qemu, and |
| importantly it contains the patch that allows skiboot and Linux to clear |
| the PCR that we require to boot.</p> |
| </li> |
| </ul> |
| </section> |
| </section> |
| |
| |
| <div class="clearer"></div> |
| </div> |
| </div> |
| </div> |
| <div class="sphinxsidebar" role="navigation" aria-label="main navigation"> |
| <div class="sphinxsidebarwrapper"> |
| <div> |
| <h3><a href="../index.html">Table of Contents</a></h3> |
| <ul> |
| <li><a class="reference internal" href="#">skiboot-6.1</a><ul> |
| <li><a class="reference internal" href="#general-changes-and-bug-fixes">General changes and bug fixes</a><ul> |
| <li><a class="reference internal" href="#i2c">I2C</a></li> |
| <li><a class="reference internal" href="#ipmi-watchdog">IPMI Watchdog</a></li> |
| </ul> |
| </li> |
| <li><a class="reference internal" href="#power8-platforms">POWER8 platforms</a></li> |
| <li><a class="reference internal" href="#simulator-platforms">Simulator platforms</a></li> |
| <li><a class="reference internal" href="#fsp-based-platforms">FSP based platforms</a></li> |
| <li><a class="reference internal" href="#ast-bmc-based-platforms">AST BMC based platforms</a></li> |
| <li><a class="reference internal" href="#utilities">Utilities</a></li> |
| <li><a class="reference internal" href="#power9">POWER9</a><ul> |
| <li><a class="reference internal" href="#npu2-nvlink2-and-opencapi">NPU2 (NVLink2 and OpenCAPI)</a></li> |
| </ul> |
| </li> |
| <li><a class="reference internal" href="#testing-and-ci">Testing and CI</a></li> |
| </ul> |
| </li> |
| </ul> |
| |
| </div> |
| <div> |
| <h4>Previous topic</h4> |
| <p class="topless"><a href="skiboot-6.0.9.html" |
| title="previous chapter">skiboot-6.0.9</a></p> |
| </div> |
| <div> |
| <h4>Next topic</h4> |
| <p class="topless"><a href="skiboot-6.1-rc1.html" |
| title="next chapter">skiboot-6.1-rc1</a></p> |
| </div> |
| <div role="note" aria-label="source link"> |
| <h3>This Page</h3> |
| <ul class="this-page-menu"> |
| <li><a href="../_sources/release-notes/skiboot-6.1.rst.txt" |
| rel="nofollow">Show Source</a></li> |
| </ul> |
| </div> |
| <div id="searchbox" style="display: none" role="search"> |
| <h3 id="searchlabel">Quick search</h3> |
| <div class="searchformwrapper"> |
| <form class="search" action="../search.html" method="get"> |
| <input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/> |
| <input type="submit" value="Go" /> |
| </form> |
| </div> |
| </div> |
| <script>document.getElementById('searchbox').style.display = "block"</script> |
| </div> |
| </div> |
| <div class="clearer"></div> |
| </div> |
| <div class="related" role="navigation" aria-label="related navigation"> |
| <h3>Navigation</h3> |
| <ul> |
| <li class="right" style="margin-right: 10px"> |
| <a href="../genindex.html" title="General Index" |
| >index</a></li> |
| <li class="right" > |
| <a href="skiboot-6.1-rc1.html" title="skiboot-6.1-rc1" |
| >next</a> |</li> |
| <li class="right" > |
| <a href="skiboot-6.0.9.html" title="skiboot-6.0.9" |
| >previous</a> |</li> |
| <li class="nav-item nav-item-0"><a href="../index.html">skiboot d365a01 |
| documentation</a> »</li> |
| <li class="nav-item nav-item-1"><a href="index.html" >Release Notes</a> »</li> |
| <li class="nav-item nav-item-this"><a href="">skiboot-6.1</a></li> |
| </ul> |
| </div> |
| <div class="footer" role="contentinfo"> |
| © Copyright 2016-2017, IBM, others. |
| Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 7.2.6. |
| </div> |
| </body> |
| </html> |