| <!DOCTYPE html> |
| |
| <html lang="en" data-content_root="../"> |
| <head> |
| <meta charset="utf-8" /> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" /> |
| |
| <title>skiboot-5.11 — skiboot d365a01 |
| documentation</title> |
| <link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=fa44fd50" /> |
| <link rel="stylesheet" type="text/css" href="../_static/classic.css?v=514cf933" /> |
| |
| <script src="../_static/documentation_options.js?v=e1fecbe9"></script> |
| <script src="../_static/doctools.js?v=888ff710"></script> |
| <script src="../_static/sphinx_highlight.js?v=dc90522c"></script> |
| |
| <link rel="index" title="Index" href="../genindex.html" /> |
| <link rel="search" title="Search" href="../search.html" /> |
| <link rel="next" title="skiboot-5.11-rc1" href="skiboot-5.11-rc1.html" /> |
| <link rel="prev" title="skiboot-5.10.6" href="skiboot-5.10.6.html" /> |
| </head><body> |
| <div class="related" role="navigation" aria-label="related navigation"> |
| <h3>Navigation</h3> |
| <ul> |
| <li class="right" style="margin-right: 10px"> |
| <a href="../genindex.html" title="General Index" |
| accesskey="I">index</a></li> |
| <li class="right" > |
| <a href="skiboot-5.11-rc1.html" title="skiboot-5.11-rc1" |
| accesskey="N">next</a> |</li> |
| <li class="right" > |
| <a href="skiboot-5.10.6.html" title="skiboot-5.10.6" |
| accesskey="P">previous</a> |</li> |
| <li class="nav-item nav-item-0"><a href="../index.html">skiboot d365a01 |
| documentation</a> »</li> |
| <li class="nav-item nav-item-1"><a href="index.html" accesskey="U">Release Notes</a> »</li> |
| <li class="nav-item nav-item-this"><a href="">skiboot-5.11</a></li> |
| </ul> |
| </div> |
| |
| <div class="document"> |
| <div class="documentwrapper"> |
| <div class="bodywrapper"> |
| <div class="body" role="main"> |
| |
| <section id="skiboot-5-11"> |
| <span id="id1"></span><h1>skiboot-5.11<a class="headerlink" href="#skiboot-5-11" title="Link to this heading">¶</a></h1> |
| <p>skiboot v5.11 was released on Friday April 6th 2018. It is the first |
| release of skiboot 5.11, which is now the new stable release |
| of skiboot following the 5.10 release, first released February 23rd 2018.</p> |
| <p>It is <em>not</em> expected to keep the 5.11 branch around for long, and instead |
| quickly move onto a 6.0, which will mark the basis for op-build v2.0 and |
| will be required for POWER9 systems.</p> |
| <p>It is expected that skiboot 6.0 will follow very shortly. Consider 5.11 |
| more of a beta release to 6.0 than anything. For POWER9 systems it should |
| certainly be more solid than previous releases though.</p> |
| <p>skiboot v5.11 contains all bug fixes as of <a class="reference internal" href="skiboot-5.10.4.html#skiboot-5-10-4"><span class="std std-ref">skiboot-5.10.4</span></a> |
| and <a class="reference internal" href="skiboot-5.4.9.html#skiboot-5-4-9"><span class="std std-ref">skiboot-5.4.9</span></a> (the currently maintained stable releases). There |
| may be more 5.10.x stable releases, it will depend on demand.</p> |
| <p>For how the skiboot stable releases work, see <a class="reference internal" href="../process/stable-skiboot-rules.html#stable-rules"><span class="std std-ref">Skiboot stable tree rules and releases</span></a> for details.</p> |
| <p>Over skiboot-5.10, we have the following changes:</p> |
| <section id="new-platforms"> |
| <h2>New Platforms<a class="headerlink" href="#new-platforms" title="Link to this heading">¶</a></h2> |
| <ul> |
| <li><p>Add VESNIN platform support</p> |
| <p>The Vesnin platform from YADRO is a 4 socked POWER8 system with up to 8TB |
| of memory with 460GB/s of memory bandwidth in only 2U. Many kudos to the |
| team from Yadro for submitting their code upstream!</p> |
| </li> |
| </ul> |
| </section> |
| <section id="new-features"> |
| <h2>New Features<a class="headerlink" href="#new-features" title="Link to this heading">¶</a></h2> |
| <ul> |
| <li><p>fast-reboot: enable by default for POWER9</p> |
| <ul class="simple"> |
| <li><p>Fast reboot is disabled if NPU2 is present or CAPI2/OpenCAPI is used</p></li> |
| </ul> |
| </li> |
| <li><p>PCI tunneled operations on PHB4</p> |
| <ul> |
| <li><p>phb4: set PBCQ Tunnel BAR for tunneled operations</p> |
| <p>P9 supports PCI tunneled operations (atomics and as_notify) that are |
| initiated by devices.</p> |
| <p>A subset of the tunneled operations require a response, that must be |
| sent back from the host to the device. For example, an atomic compare |
| and swap will return the compare status, as swap will only performed |
| in case of success. Similarly, as_notify reports if the target thread |
| has been woken up or not, because the operation may fail.</p> |
| <p>To enable tunneled operations, a device driver must tell the host where |
| it expects tunneled operation responses, by setting the PBCQ Tunnel BAR |
| Response register with a specific value within the range of its BARs.</p> |
| <p>This register is currently initialized by enable_capi_mode(). But, as |
| tunneled operations may also operate in PCI mode, a new API is required |
| to set the PBCQ Tunnel BAR Response register, without switching to CAPI |
| mode.</p> |
| <p>This patch provides two new OPAL calls to get/set the PBCQ Tunnel |
| BAR Response register.</p> |
| <p>Note: as there is only one PBCQ Tunnel BAR register, shared between |
| all the devices connected to the same PHB, only one of these devices |
| will be able to use tunneled operations, at any time.</p> |
| </li> |
| <li><p>phb4: set PHB CMPM registers for tunneled operations</p> |
| <p>P9 supports PCI tunneled operations (atomics and as_notify) that require |
| setting the PHB ASN Compare/Mask register with a 16-bit indication.</p> |
| <p>This register is currently initialized by enable_capi_mode(). But, as |
| tunneled operations may also work in PCI mode, the ASN Compare/Mask |
| register should rather be initialized in phb4_init_ioda3().</p> |
| <p>This patch also adds “ibm,phb-indications” to the device tree, to tell |
| Linux the values of CAPI, ASN, and NBW indications, when supported.</p> |
| <p>Tunneled operations tested by IBM in CAPI mode, by Mellanox Technologies |
| in PCI mode.</p> |
| </li> |
| </ul> |
| </li> |
| <li><p>Tie tm-suspend fw-feature and opal_reinit_cpus() together</p> |
| <p>Currently opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) |
| always returns OPAL_UNSUPPORTED.</p> |
| <p>This ties the tm suspend fw-feature to the |
| opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) so that when tm |
| suspend is disabled, we correctly report it to the kernel. For |
| backwards compatibility, it’s assumed tm suspend is available if the |
| fw-feature is not present.</p> |
| <p>Currently hostboot will clear fw-feature(TM_SUSPEND_ENABLED) on P9N |
| DD2.1. P9N DD2.2 will set fw-feature(TM_SUSPEND_ENABLED). DD2.0 and |
| below has TM disabled completely (not just suspend).</p> |
| <p>We are using opal_reinit_cpus() to determine this setting (rather than |
| the device tree/HDAT) as some future firmware may let us change this |
| dynamically after boot. That is not the case currently though.</p> |
| </li> |
| </ul> |
| </section> |
| <section id="power-management"> |
| <h2>Power Management<a class="headerlink" href="#power-management" title="Link to this heading">¶</a></h2> |
| <ul> |
| <li><p>SLW: Increase stop4-5 residency by 10x</p> |
| <p>Using DGEMM benchmark we observed there was a drop of 5-9% throughput with |
| and without stop4/5. In this benchmark the GPU waits on the cpu to wakeup |
| and provide the subsequent data block to compute. The wakup latency |
| accumulates over the run and shows up as a performance drop.</p> |
| <p>Linux enters stop4/5 more aggressively for its wakeup latency. Increasing |
| the residency from 1ms to 10ms makes the performance drop <1%</p> |
| </li> |
| <li><p>occ: Set up OCC messaging even if we fail to setup pstates</p> |
| <p>This means that we no longer hit this bug if we fail to get valid pstates |
| from the OCC.</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>[console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear |
| echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear |
| [ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 |
| [ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 |
| [ 10.318805] Disabling lock debugging due to kernel taint |
| [ 10.318808] Severe Machine check interrupt [Not recovered] |
| [ 10.318812] NIP [000000003003e434]: 0x3003e434 |
| [ 10.318813] Initiator: CPU |
| [ 10.318815] Error type: Real address [Load/Store (foreign)] |
| [ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception |
| [ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3 |
| [ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240 |
| [ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1) |
| [ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000 |
| [ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1 |
| </pre></div> |
| </div> |
| </li> |
| </ul> |
| <section id="mbox-based-platforms"> |
| <h3>mbox based platforms<a class="headerlink" href="#mbox-based-platforms" title="Link to this heading">¶</a></h3> |
| <p>For platforms using the mbox protocol for host flash access (all BMC based |
| OpenPOWER systems, most OpenBMC based systems) there have been some hardening |
| efforts in the event of the BMC being poorly behaved.</p> |
| <ul> |
| <li><p>mbox: Reduce default BMC timeouts</p> |
| <p>Rebooting a BMC can take 70 seconds. Skiboot cannot possibly spin for |
| 70 seconds waiting for a BMC to come back. This also makes the current |
| default of 30 seconds a bit pointless, is it far too short to be a |
| worse case wait time but too long to avoid hitting hardlockup detectors |
| and wrecking havoc inside host linux.</p> |
| <p>Just change it to three seconds so that host linux will survive and |
| that, reads and writes will fail but at least the host stays up.</p> |
| <p>Also refactored the waiting loop just a bit so that it’s easier to read.</p> |
| </li> |
| <li><p>mbox: Harden against BMC daemon errors</p> |
| <p>Bugs present in the BMC daemon mean that skiboot gets presented with |
| mbox windows of size zero. These windows cannot be valid and skiboot |
| already detects these conditions.</p> |
| <p>Currently skiboot warns quite strongly about the occurrence of these |
| problems. The problem for skiboot is that it doesn’t take any action. |
| Initially I wanting to avoid putting policy like this into skiboot but |
| since these bugs aren’t going away and skiboot barfing is leading to |
| lockups and ultimately the host going down something needs to be done.</p> |
| <p>I propose that when we detect the problem we fail the mbox call and punt |
| the problem back up to Linux. I don’t like it but at least it will cause |
| errors to cascade and won’t bring the host down. I’m not sure how Linux |
| is supposed to detect this or what it can even do but this is better |
| than a crash.</p> |
| <p>Diagnosing a failure to boot if skiboot its self fails to read flash may |
| be marginally more difficult with this patch. This is because skiboot |
| will now only print one warning about the zero sized window rather than |
| continuously spitting it out.</p> |
| </li> |
| </ul> |
| </section> |
| </section> |
| <section id="fast-reboot-improvements"> |
| <h2>Fast Reboot Improvements<a class="headerlink" href="#fast-reboot-improvements" title="Link to this heading">¶</a></h2> |
| <p>Around fast-reboot we have made several improvements to harden the fast |
| reboot code paths and resort to a full IPL if something doesn’t look right.</p> |
| <ul> |
| <li><p>core/fast-reboot: zero memory after fast reboot</p> |
| <p>This improves the security and predictability of the fast reboot |
| environment.</p> |
| <p>There can not be a secure fence between fast reboots, because a |
| malicious OS can modify the firmware itself. However a well-behaved |
| OS can have a reasonable expectation that OS memory regions it has |
| modified will be cleared upon fast reboot.</p> |
| <p>The memory is zeroed after all other CPUs come up from fast reboot, |
| just before the new kernel is loaded and booted into. This allows |
| image preloading to run concurrently, and will allow parallelisation |
| of the clearing in future.</p> |
| </li> |
| <li><p>core/fast-reboot: verify mem regions before fast reboot</p> |
| <p>Run the mem_region sanity checkers before proceeding with fast |
| reboot.</p> |
| <p>This is the beginning of proactive sanity checks on opal data |
| for fast reboot (with complements the reactive disable_fast_reboot |
| cases). This is encouraged to re-use and share any kind of debug |
| code and unit test code.</p> |
| </li> |
| <li><p>fast-reboot: occ: Only delete /ibm, opal/power-mgt nodes if they exist</p></li> |
| <li><p>core/fast-reboot: disable fast reboot upon fundamental entry/exit/locking errors</p> |
| <p>This disables fast reboot in several more cases where serious errors |
| like lock corruption or call re-entrancy are detected.</p> |
| </li> |
| <li><p>capp: Disable fast-reboot whenever enable_capi_mode() is called</p> |
| <p>This patch updates phb4_set_capi_mode() to disable fast-reboot |
| whenever enable_capi_mode() is called, irrespective to its return |
| value. This should prevent against a possibility of not disabling |
| fast-reboot when some changes to enable_capi_mode() causing return of |
| an error and leaving CAPP in enabled mode.</p> |
| </li> |
| <li><p>fast-reboot: occ: Delete OCC child nodes in /ibm, opal/power-mgt</p> |
| <p>Fast-reboot in P8 fails to re-init OCC data as there are chipwise OCC |
| nodes which are already present in the /ibm,opal/power-mgt node. These |
| per-chip nodes hold the voltage IDs for each pstate and these can be |
| changed on OCC pstate table biasing. So delete these before calling |
| the re-init code to re-parse and populate the pstate data.</p> |
| </li> |
| </ul> |
| </section> |
| <section id="debugging-sreset-improvemens"> |
| <h2>Debugging/SRESET improvemens<a class="headerlink" href="#debugging-sreset-improvemens" title="Link to this heading">¶</a></h2> |
| <p>Since <a class="reference internal" href="skiboot-5.11-rc1.html#skiboot-5-11-rc1"><span class="std std-ref">skiboot-5.11-rc1</span></a>:</p> |
| <ul> |
| <li><p>core/cpu: Prevent clobbering of stack guard for boot-cpu</p> |
| <p>Commit 90d53934c2da (“core/cpu: discover stack region size before |
| initialising memory regions”) introduced memzero for struct cpu_thread |
| in init_cpu_thread(). This has an unintended side effect of clobbering |
| the stack-guard cannery of the boot_cpu stack. This results in opal |
| failing to init with this failure message:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>CPU: P9 generation processor (max 4 threads/core) |
| CPU: Boot CPU PIR is 0x0004 PVR is 0x004e1200 |
| Guard skip = 0 |
| Stack corruption detected ! |
| Aborting! |
| CPU 0004 Backtrace: |
| S: 0000000031c13ab0 R: 0000000030013b0c .backtrace+0x5c |
| S: 0000000031c13b50 R: 000000003001bd18 ._abort+0x60 |
| S: 0000000031c13be0 R: 0000000030013bbc .__stack_chk_fail+0x54 |
| S: 0000000031c13c60 R: 00000000300c5b70 .memset+0x12c |
| S: 0000000031c13d00 R: 0000000030019aa8 .init_cpu_thread+0x40 |
| S: 0000000031c13d90 R: 000000003001b520 .init_boot_cpu+0x188 |
| S: 0000000031c13e30 R: 0000000030015050 .main_cpu_entry+0xd0 |
| S: 0000000031c13f00 R: 0000000030002700 boot_entry+0x1c0 |
| </pre></div> |
| </div> |
| <p>So the patch provides a fix by tweaking the memset() call in |
| init_cpu_thread() to skip over the stack-guard cannery.</p> |
| </li> |
| <li><p>core/lock.c: ensure valid start value for lock spin duration warning</p> |
| <p>The previous fix in a8e6cc3f4 only addressed half of the problem, as |
| we could also get an invalid value for start, causing us to fail |
| in a weird way.</p> |
| <p>This was caught by the testcases.OpTestHMIHandling.HMI_TFMR_ERRORS |
| test in op-test-framework.</p> |
| <p>You’d get to this part of the test and get the erroneous lock |
| spinning warnings:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>PATH=/usr/local/sbin:$PATH putscom -c 00000000 0x2b010a84 0003080000000000 |
| 0000080000000000 |
| [ 790.140976993,4] WARNING: Lock has been spinning for 790275ms |
| [ 790.140976993,4] WARNING: Lock has been spinning for 790275ms |
| [ 790.140976918,4] WARNING: Lock has been spinning for 790275ms |
| </pre></div> |
| </div> |
| <p>This patch checks the validity of timebase before setting start, |
| and only checks the lock timeout if we got a valid start value.</p> |
| </li> |
| </ul> |
| <p>Since <a class="reference internal" href="skiboot-5.10.html#skiboot-5-10"><span class="std std-ref">skiboot-5.10</span></a>:</p> |
| <ul> |
| <li><p>core/opal: allow some re-entrant calls</p> |
| <p>This allows a small number of OPAL calls to succeed despite re-entering |
| the firmware, and rejects others rather than aborting.</p> |
| <p>This allows a system reset interrupt that interrupts OPAL to do something |
| useful. Sreset other CPUs, use the console, which allows xmon to work or |
| stack traces to be printed, reboot the system.</p> |
| <p>Use OPAL_INTERNAL_ERROR when rejecting, rather than OPAL_BUSY, which is |
| used for many other things that does not mean a serious permanent error.</p> |
| </li> |
| <li><p>core/opal: abort in case of re-entrant OPAL call</p> |
| <p>The stack is already destroyed by the time we get here, so there |
| is not much point continuing.</p> |
| </li> |
| <li><p>core/lock: Add lock timeout warnings</p> |
| <p>There are currently no timeout warnings for locks in skiboot. We assume |
| that the lock will eventually become free, which may not always be the |
| case.</p> |
| <p>This patch adds timeout warnings for locks. Any lock which spins for more |
| than 5 seconds will throw a warning and stacktrace for that thread. This is |
| useful for debugging siturations where a lock which hang, waiting for the |
| lock to be freed.</p> |
| </li> |
| <li><p>core/lock: Add deadlock detection</p> |
| <p>This adds simple deadlock detection. The detection looks for circular |
| dependencies in the lock requests. It will abort and display a stack trace |
| when a deadlock occurs. |
| The detection is enabled by DEBUG_LOCKS (enabled by default). |
| While the detection may have a slight performance overhead, as there are |
| not a huge number of locks in skiboot this overhead isn’t significant.</p> |
| </li> |
| <li><p>core/hmi: report processor recovery reason from core FIR bits on P9</p> |
| <p>When an error is encountered that causes processor recovery, HMI is |
| generated if the recovery was successful. The reason is recorded in |
| the core FIR, which gets copied into the WOF.</p> |
| <p>In this case dump the WOF register and an error string into the OPAL |
| msglog.</p> |
| <p>A broken init setting led to HMIs reported in Linux as:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span> <span class="mf">3.591547</span><span class="p">]</span> <span class="n">Harmless</span> <span class="n">Hypervisor</span> <span class="n">Maintenance</span> <span class="n">interrupt</span> <span class="p">[</span><span class="n">Recovered</span><span class="p">]</span> |
| <span class="p">[</span> <span class="mf">3.591648</span><span class="p">]</span> <span class="n">Error</span> <span class="n">detail</span><span class="p">:</span> <span class="n">Processor</span> <span class="n">Recovery</span> <span class="n">done</span> |
| <span class="p">[</span> <span class="mf">3.591714</span><span class="p">]</span> <span class="n">HMER</span><span class="p">:</span> <span class="mi">2040000000000000</span> |
| </pre></div> |
| </div> |
| <p>This patch would have been useful because it tells us exactly that |
| the problem is in the d-side ERAT:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span> <span class="mf">414.489690798</span><span class="p">,</span><span class="mi">7</span><span class="p">]</span> <span class="n">HMI</span><span class="p">:</span> <span class="n">Received</span> <span class="n">HMI</span> <span class="n">interrupt</span><span class="p">:</span> <span class="n">HMER</span> <span class="o">=</span> <span class="mh">0x2040000000000000</span> |
| <span class="p">[</span> <span class="mf">414.489693339</span><span class="p">,</span><span class="mi">7</span><span class="p">]</span> <span class="n">HMI</span><span class="p">:</span> <span class="p">[</span><span class="n">Loc</span><span class="p">:</span> <span class="n">UOPWR</span><span class="mf">.0000000</span><span class="o">-</span><span class="n">Node0</span><span class="o">-</span><span class="n">Proc0</span><span class="p">]:</span> <span class="n">P</span><span class="p">:</span><span class="mi">0</span> <span class="n">C</span><span class="p">:</span><span class="mi">1</span> <span class="n">T</span><span class="p">:</span><span class="mi">1</span><span class="p">:</span> <span class="n">Processor</span> <span class="n">recovery</span> <span class="n">occurred</span><span class="o">.</span> |
| <span class="p">[</span> <span class="mf">414.489699837</span><span class="p">,</span><span class="mi">7</span><span class="p">]</span> <span class="n">HMI</span><span class="p">:</span> <span class="n">Core</span> <span class="n">WOF</span> <span class="o">=</span> <span class="mh">0x0000000410000000</span> <span class="n">recovered</span> <span class="n">error</span><span class="p">:</span> |
| <span class="p">[</span> <span class="mf">414.489701543</span><span class="p">,</span><span class="mi">7</span><span class="p">]</span> <span class="n">HMI</span><span class="p">:</span> <span class="n">LSU</span> <span class="o">-</span> <span class="n">SRAM</span> <span class="p">(</span><span class="n">DCACHE</span> <span class="n">parity</span><span class="p">,</span> <span class="n">etc</span><span class="p">)</span> |
| <span class="p">[</span> <span class="mf">414.489702341</span><span class="p">,</span><span class="mi">7</span><span class="p">]</span> <span class="n">HMI</span><span class="p">:</span> <span class="n">LSU</span> <span class="o">-</span> <span class="n">ERAT</span> <span class="n">multi</span> <span class="n">hit</span> |
| </pre></div> |
| </div> |
| <p>In future it will be good to unify this reporting, so Linux could |
| print something more useful. Until then, this gives some good data.</p> |
| </li> |
| </ul> |
| </section> |
| <section id="npu2-nvlink2-fixes"> |
| <h2>NPU2/NVLink2 Fixes<a class="headerlink" href="#npu2-nvlink2-fixes" title="Link to this heading">¶</a></h2> |
| <ul> |
| <li><p>npu2: Add performance tuning SCOM inits</p> |
| <p>Peer-to-peer GPU bandwidth latency testing has produced some tunable |
| values that improve performance. Add them to our device initialization.</p> |
| <p>File these under things that need to be cleaned up with nice #defines |
| for the register names and bitfields when we get time.</p> |
| <p>A few of the settings are dependent on the system’s particular NVLink |
| topology, so introduce a helper to determine how many links go to a |
| single GPU.</p> |
| </li> |
| <li><p>hw/npu2: Assign a unique LPARSHORTID per GPU</p> |
| <p>This gets used elsewhere to index items in the XTS tables.</p> |
| </li> |
| <li><p>NPU2: dump NPU2 registers on npu2 HMI</p> |
| <p>Due to the nature of debugging npu2 issues, folk are wanting the |
| full list of NPU2 registers dumped when there’s a problem.</p> |
| </li> |
| <li><p>npu2: Remove DD1 support</p> |
| <p>Major changes in the NPU between DD1 and DD2 necessitated a fair bit of |
| revision-specific code.</p> |
| <p>Now that all our lab machines are DD2, we no longer test anything on DD1 |
| and it’s time to get rid of it.</p> |
| <p>Remove DD1-specific code and abort probe if we’re running on a DD1 machine.</p> |
| </li> |
| <li><p>npu2: Disable fast reboot</p> |
| <p>Fast reboot does not yet work right with the NPU. It’s been disabled on |
| NVLink and OpenCAPI machines. Do the same for NVLink2.</p> |
| <p>This amounts to a port of 3e4577939bbf (“npu: Fix broken fast reset”) |
| from the npu code to npu2.</p> |
| </li> |
| <li><p>npu2: Use unfiltered mode in XTS tables</p> |
| <p>The XTS_PID context table is limited to 256 possible pids/contexts. To |
| relieve this limitation, make use of “unfiltered mode” instead.</p> |
| <p>If an entry in the XTS_BDF table has the bit for unfiltered mode set, we |
| can just use one context for that entire bdf/lpar, regardless of pid. |
| Instead of of searching the XTS_PID table, the NMMU checkout request |
| will simply use the entry indexed by lparshort id instead.</p> |
| <p>Change opal_npu_init_context() to create these lparshort-indexed |
| wildcard entries (0-15) instead of allocating one for each pid. Check |
| that multiple calls for the same bdf all specify the same msr value.</p> |
| <p>In opal_npu_destroy_context(), continue validating the bdf argument, |
| ensuring that it actually maps to an lpar, but no longer remove anything |
| from the XTS_PID table. If/when we start supporting virtualized GPUs, we |
| might consider actually removing these wildcard entries by keeping a |
| refcount, but keep things simple for now.</p> |
| </li> |
| </ul> |
| </section> |
| <section id="capi-opencapi"> |
| <h2>CAPI/OpenCAPI<a class="headerlink" href="#capi-opencapi" title="Link to this heading">¶</a></h2> |
| <p>Since <a class="reference internal" href="skiboot-5.11-rc1.html#skiboot-5-11-rc1"><span class="std std-ref">skiboot-5.11-rc1</span></a>:</p> |
| <ul> |
| <li><p>capi: Poll Err/Status register during CAPP recovery</p> |
| <p>This patch updates do_capp_recovery_scoms() to poll the CAPP |
| Err/Status control register, check for CAPP-Recovery to complete/fail |
| based on indications of BITS-1,5,9 and then proceed with the |
| CAPP-Recovery scoms iif recovery completed successfully. This would |
| prevent cases where we bring-up the PCIe link while recovery sequencer |
| on CAPP is still busy with casting out cache lines.</p> |
| <p>In case CAPP-Recovery didn’t complete successfully an error is returned |
| from do_capp_recovery_scoms() asking phb4_creset() to keep the phb4 |
| fenced and mark it as broken.</p> |
| <p>The loop that implements polling of Err/Status register will also log |
| an error on the PHB when it continues for more than 168ms which is the |
| max time to failure for CAPP-Recovery.</p> |
| </li> |
| </ul> |
| <p>Since <a class="reference internal" href="skiboot-5.10.html#skiboot-5-10"><span class="std std-ref">skiboot-5.10</span></a>:</p> |
| <ul> |
| <li><p>npu2-opencapi: Add OpenCAPI OPAL API calls</p> |
| <p>Add three OPAL API calls that are required by the ocxl driver.</p> |
| <ul> |
| <li><p>OPAL_NPU_SPA_SETUP</p> |
| <p>The Shared Process Area (SPA) is a table containing one entry (a |
| “Process Element”) per memory context which can be accessed by the |
| OpenCAPI device.</p> |
| </li> |
| <li><p>OPAL_NPU_SPA_CLEAR_CACHE</p> |
| <p>The NPU keeps a cache of recently accessed memory contexts. When a |
| Process Element is removed from the SPA, the cache for the link must be |
| cleared.</p> |
| </li> |
| <li><p>OPAL_NPU_TL_SET</p> |
| <p>The Transaction Layer specification defines several templates for |
| messages to be exchanged on the link. During link setup, the host and |
| device must negotiate what templates are supported on both sides and at |
| what rates those messages can be sent.</p> |
| </li> |
| </ul> |
| </li> |
| <li><p>npu2-opencapi: Train OpenCAPI links and setup devices</p> |
| <p>Scan the OpenCAPI links under the NPU, and for each link, reset the card, |
| set up a device, train the link and register a PHB.</p> |
| <p>Implement the necessary operations for the OpenCAPI PHB type.</p> |
| <p>For bringup, test and debug purposes, we allow an NVRAM setting, |
| “opencapi-link-training” that can be set to either disable link training |
| completely or to use the prbs31 test pattern.</p> |
| <p>To disable link training:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">nvram</span> <span class="o">-</span><span class="n">p</span> <span class="n">ibm</span><span class="p">,</span><span class="n">skiboot</span> <span class="o">--</span><span class="n">update</span><span class="o">-</span><span class="n">config</span> <span class="n">opencapi</span><span class="o">-</span><span class="n">link</span><span class="o">-</span><span class="n">training</span><span class="o">=</span><span class="n">none</span> |
| </pre></div> |
| </div> |
| <p>To use prbs31:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">nvram</span> <span class="o">-</span><span class="n">p</span> <span class="n">ibm</span><span class="p">,</span><span class="n">skiboot</span> <span class="o">--</span><span class="n">update</span><span class="o">-</span><span class="n">config</span> <span class="n">opencapi</span><span class="o">-</span><span class="n">link</span><span class="o">-</span><span class="n">training</span><span class="o">=</span><span class="n">prbs31</span> |
| </pre></div> |
| </div> |
| </li> |
| <li><p>npu2-hw-procedures: Add support for OpenCAPI PHY link training</p> |
| <p>Unlike NVLink, which uses the pci-virt framework to fake a PCI |
| configuration space for NVLink devices, the OpenCAPI device model presents |
| us with a real configuration space handled by the device over the OpenCAPI |
| link.</p> |
| <p>As a result, we have to train the OpenCAPI link in skiboot before we do PCI |
| probing, so that config space can be accessed, rather than having link |
| training being triggered by the Linux driver.</p> |
| </li> |
| <li><p>npu2-opencapi: Configure NPU for OpenCAPI</p> |
| <p>Scan the device tree for NPUs with OpenCAPI links and configure the NPU per |
| the initialisation sequence in the NPU OpenCAPI workbook.</p> |
| </li> |
| <li><p>capp: Make error in capp timebase sync a non-fatal error</p> |
| <p>Presently when we encounter an error while synchronizing capp timebase |
| with chip-tod at the end of enable_capi_mode() we return an |
| error. This has an to unintended consequences. First this will prevent |
| disabling of fast-reboot even though CAPP is already enabled by this |
| point. Secondly, failure during timebase sync is a non fatal error or |
| capp initialization as CAPP/PSL can continue working after this and an |
| AFU will only see an error when it tries to read the timebase value |
| from PSL.</p> |
| <p>So this patch updates enable_capi_mode() to not return an error in |
| case call to chiptod_capp_timebase_sync() fails. The function will now |
| just log an error and continue further with capp init sequence. This |
| make the current implementation align with the one in kernel ‘cxl’ |
| driver which also assumes the PSL timebase sync errors as non-fatal |
| init error.</p> |
| </li> |
| <li><p>npu2-opencapi: Fix assert on link reset during init</p> |
| <p>We don’t support resetting an opencapi link yet.</p> |
| <p>Commit fe6d86b9 (“pci: Make fast reboot creset PHBs in parallel”) |
| tries resetting any PHB whose slot defines a ‘run_sm’ callback. It |
| raises an assert when applied to an opencapi PHB, as ‘run_sm’ calls |
| the ‘freset’ callback, which is not yet defined for opencapi.</p> |
| <p>Fix it for now by removing the currently useless definition of |
| ‘run_sm’ on the opencapi slot. It will print a message in the skiboot |
| log because the PHB cannot be reset, which is correct. It will all go |
| away when we add support for resetting an opencapi link.</p> |
| </li> |
| <li><p>capp: Add lid definition for P9 DD-2.2</p> |
| <p>Update fsp_lid_map to include CAPP ucode lid for phb4-chipid == |
| 0x202d1 that corresponds to P9 DD-2.2 chip.</p> |
| </li> |
| <li><p>capp: Disable fast-reboot when capp is enabled</p></li> |
| </ul> |
| </section> |
| <section id="pci"> |
| <h2>PCI<a class="headerlink" href="#pci" title="Link to this heading">¶</a></h2> |
| <p>Since <a class="reference internal" href="skiboot-5.11-rc1.html#skiboot-5-11-rc1"><span class="std std-ref">skiboot-5.11-rc1</span></a>:</p> |
| <ul> |
| <li><p>phb4: Reset FIR/NFIR registers before PHB4 probe</p> |
| <p>The function phb4_probe_stack() resets “ETU Reset Register” to |
| unfreeze the PHB before it performs mmio access on the PHB. However in |
| case the FIR/NFIR registers are set while entering this function, |
| the reset of “ETU Reset Register” wont unfreeze the PHB and it will |
| remain fenced. This leads to failure during initial CRESET of the PHB |
| as mmio access is still not enabled and an error message of the form |
| below is logged:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">PHB</span><span class="c1">#0000[0:0]: Initializing PHB4...</span> |
| <span class="n">PHB</span><span class="c1">#0000[0:0]: Default system config: 0xffffffffffffffff</span> |
| <span class="n">PHB</span><span class="c1">#0000[0:0]: New system config : 0xffffffffffffffff</span> |
| <span class="n">PHB</span><span class="c1">#0000[0:0]: Initial PHB CRESET is 0xffffffffffffffff</span> |
| <span class="n">PHB</span><span class="c1">#0000[0:0]: Waiting for DLP PG reset to complete...</span> |
| <span class="o"><</span><span class="n">snip</span><span class="o">></span> |
| <span class="n">PHB</span><span class="c1">#0000[0:0]: Timeout waiting for DLP PG reset !</span> |
| <span class="n">PHB</span><span class="c1">#0000[0:0]: Initialization failed</span> |
| </pre></div> |
| </div> |
| <p>This is especially seen happening during the MPIPL flow where SBE |
| would quiesces and fence the PHB so that it doesn’t stomp on the main |
| memory. However when skiboot enters phb4_probe_stack() after MPIPL, |
| the FIR/NFIR registers are set forcing PHB to re-enter fence after ETU |
| reset is done.</p> |
| <p>So to fix this issue the patch introduces new xscom writes to |
| phb4_probe_stack() to reset the FIR/NFIR registers before performing |
| ETU reset to enable mmio access to the PHB.</p> |
| </li> |
| </ul> |
| <p>Since <a class="reference internal" href="skiboot-5.10.html#skiboot-5-10"><span class="std std-ref">skiboot-5.10</span></a>:</p> |
| <ul> |
| <li><p>pci: Reduce log level of error message</p> |
| <p>If a link doesn’t train, we can end up with error messages like this:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span> <span class="mf">63.027261959</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">PHB</span><span class="c1">#0032[8:2]: LINK: Timeout waiting for electrical link</span> |
| <span class="p">[</span> <span class="mf">63.027265573</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">PHB</span><span class="c1">#0032:00:00.0 Error -6 resetting</span> |
| </pre></div> |
| </div> |
| <p>The first message is useful but the second message is just debug from |
| the core PCI code and is confusing to print to the console.</p> |
| <p>This reduces the second print to debug level so it’s not seen by the |
| console by default.</p> |
| </li> |
| <li><p>Revert “platforms/astbmc/slots.c: Allow comparison of bus numbers when matching slots”</p> |
| <p>This reverts commit bda7cc4d0354eb3f66629d410b2afc08c79f795f.</p> |
| <p>Ben says: |
| It’s on purpose that we do NOT compare the bus numbers, |
| they are always 0 in the slot table |
| we do a hierarchical walk of the tree, matching only the |
| devfn’s along the way bcs the bus numbering isn’t fixed |
| this breaks all slot naming etc… stuff on anything using |
| the “skiboot” slot tables (P8 opp typically)</p> |
| </li> |
| <li><p>core/pci-dt-slot: Fix booting with no slot map</p> |
| <p>Currently if you don’t have a slot map in the device tree in |
| /ibm,pcie-slots, you can crash with a back trace like this:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">CPU</span> <span class="mi">0034</span> <span class="n">Backtrace</span><span class="p">:</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">cd3370</span> <span class="n">R</span><span class="p">:</span> <span class="mi">000000003001362</span><span class="n">c</span> <span class="o">.</span><span class="n">backtrace</span><span class="o">+</span><span class="mh">0x48</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">cd3410</span> <span class="n">R</span><span class="p">:</span> <span class="mf">0000000030019e38</span> <span class="o">.</span><span class="n">_abort</span><span class="o">+</span><span class="mh">0x4c</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">cd3490</span> <span class="n">R</span><span class="p">:</span> <span class="mi">000000003002760</span><span class="n">c</span> <span class="o">.</span><span class="n">exception_entry</span><span class="o">+</span><span class="mh">0x180</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">cd3670</span> <span class="n">R</span><span class="p">:</span> <span class="mi">0000000000001</span><span class="n">f10</span> <span class="o">*</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">cd3850</span> <span class="n">R</span><span class="p">:</span> <span class="mi">00000000300</span><span class="n">b4f3e</span> <span class="o">*</span> <span class="n">cpu_features_table</span><span class="o">+</span><span class="mh">0x1d9e</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">cd38e0</span> <span class="n">R</span><span class="p">:</span> <span class="mi">000000003002682</span><span class="n">c</span> <span class="o">.</span><span class="n">dt_node_is_compatible</span><span class="o">+</span><span class="mh">0x20</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">cd3960</span> <span class="n">R</span><span class="p">:</span> <span class="mf">0000000030030e08</span> <span class="o">.</span><span class="n">map_pci_dev_to_slot</span><span class="o">+</span><span class="mh">0x16c</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">cd3a30</span> <span class="n">R</span><span class="p">:</span> <span class="mi">0000000030091054</span> <span class="o">.</span><span class="n">dt_slot_get_slot_info</span><span class="o">+</span><span class="mh">0x28</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">cd3ac0</span> <span class="n">R</span><span class="p">:</span> <span class="mf">000000003001e27</span><span class="n">c</span> <span class="o">.</span><span class="n">pci_scan_one</span><span class="o">+</span><span class="mh">0x2ac</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">cd3ba0</span> <span class="n">R</span><span class="p">:</span> <span class="mf">000000003001e588</span> <span class="o">.</span><span class="n">pci_scan_bus</span><span class="o">+</span><span class="mh">0x70</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">cd3cb0</span> <span class="n">R</span><span class="p">:</span> <span class="mi">000000003001</span><span class="n">ee74</span> <span class="o">.</span><span class="n">pci_scan_phb</span><span class="o">+</span><span class="mh">0x100</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">cd3d40</span> <span class="n">R</span><span class="p">:</span> <span class="mi">0000000030017</span><span class="n">ff0</span> <span class="o">.</span><span class="n">cpu_process_jobs</span><span class="o">+</span><span class="mh">0xdc</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">cd3e00</span> <span class="n">R</span><span class="p">:</span> <span class="mi">0000000030014</span><span class="n">cb0</span> <span class="o">.</span><span class="n">__secondary_cpu_entry</span><span class="o">+</span><span class="mh">0x44</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">cd3e80</span> <span class="n">R</span><span class="p">:</span> <span class="mi">0000000030014</span><span class="n">d04</span> <span class="o">.</span><span class="n">secondary_cpu_entry</span><span class="o">+</span><span class="mh">0x34</span> |
| <span class="n">S</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">cd3f00</span> <span class="n">R</span><span class="p">:</span> <span class="mi">0000000030002770</span> <span class="n">secondary_wait</span><span class="o">+</span><span class="mh">0x8c</span> |
| <span class="p">[</span> <span class="mf">73.016947149</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">Fatal</span> <span class="n">MCE</span> <span class="n">at</span> <span class="mi">0000000030026054</span> <span class="o">.</span><span class="n">dt_find_property</span><span class="o">+</span><span class="mh">0x30</span> |
| <span class="p">[</span> <span class="mf">73.017073254</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">CFAR</span> <span class="p">:</span> <span class="mi">0000000030026040</span> |
| <span class="p">[</span> <span class="mf">73.017138048</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">SRR0</span> <span class="p">:</span> <span class="mi">0000000030026054</span> <span class="n">SRR1</span> <span class="p">:</span> <span class="mi">9000000000201000</span> |
| <span class="p">[</span> <span class="mf">73.017198375</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">HSRR0</span><span class="p">:</span> <span class="mi">0000000000000000</span> <span class="n">HSRR1</span><span class="p">:</span> <span class="mi">0000000000000000</span> |
| <span class="p">[</span> <span class="mf">73.017263210</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">DSISR</span><span class="p">:</span> <span class="mi">00000008</span> <span class="n">DAR</span> <span class="p">:</span> <span class="mi">7</span><span class="n">c7b1b7848002524</span> |
| <span class="p">[</span> <span class="mf">73.017352517</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">LR</span> <span class="p">:</span> <span class="mi">000000003002602</span><span class="n">c</span> <span class="n">CTR</span> <span class="p">:</span> <span class="mi">000000003009102</span><span class="n">c</span> |
| <span class="p">[</span> <span class="mf">73.017419778</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">CR</span> <span class="p">:</span> <span class="mi">20004204</span> <span class="n">XER</span> <span class="p">:</span> <span class="mi">20040000</span> |
| <span class="p">[</span> <span class="mf">73.017502425</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR00</span><span class="p">:</span> <span class="mi">000000003002682</span><span class="n">c</span> <span class="n">GPR16</span><span class="p">:</span> <span class="mi">0000000000000000</span> |
| <span class="p">[</span> <span class="mf">73.017586924</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR01</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c23670</span> <span class="n">GPR17</span><span class="p">:</span> <span class="mi">0000000000000000</span> |
| <span class="p">[</span> <span class="mf">73.017643873</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR02</span><span class="p">:</span> <span class="mi">00000000300</span><span class="n">fd500</span> <span class="n">GPR18</span><span class="p">:</span> <span class="mi">0000000000000000</span> |
| <span class="p">[</span> <span class="mf">73.017767091</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR03</span><span class="p">:</span> <span class="n">fffffffffffffff8</span> <span class="n">GPR19</span><span class="p">:</span> <span class="mi">0000000000000000</span> |
| <span class="p">[</span> <span class="mf">73.017855707</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR04</span><span class="p">:</span> <span class="mi">00000000300</span><span class="n">b3dc6</span> <span class="n">GPR20</span><span class="p">:</span> <span class="mi">0000000000000000</span> |
| <span class="p">[</span> <span class="mf">73.017943944</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR05</span><span class="p">:</span> <span class="mi">0000000000000000</span> <span class="n">GPR21</span><span class="p">:</span> <span class="mi">00000000300</span><span class="n">bb6d2</span> |
| <span class="p">[</span> <span class="mf">73.018024709</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR06</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c23910</span> <span class="n">GPR22</span><span class="p">:</span> <span class="mi">0000000000000000</span> |
| <span class="p">[</span> <span class="mf">73.018117716</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR07</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c23930</span> <span class="n">GPR23</span><span class="p">:</span> <span class="mi">0000000000000000</span> |
| <span class="p">[</span> <span class="mf">73.018195974</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR08</span><span class="p">:</span> <span class="mi">0000000000000000</span> <span class="n">GPR24</span><span class="p">:</span> <span class="mi">0000000000000000</span> |
| <span class="p">[</span> <span class="mf">73.018278350</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR09</span><span class="p">:</span> <span class="mi">0000000000000000</span> <span class="n">GPR25</span><span class="p">:</span> <span class="mi">0000000000000000</span> |
| <span class="p">[</span> <span class="mf">73.018353795</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR10</span><span class="p">:</span> <span class="mi">0000000000000028</span> <span class="n">GPR26</span><span class="p">:</span> <span class="mi">00000000300</span><span class="n">be6fb</span> |
| <span class="p">[</span> <span class="mf">73.018424362</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR11</span><span class="p">:</span> <span class="mi">0000000000000000</span> <span class="n">GPR27</span><span class="p">:</span> <span class="mi">0000000000000000</span> |
| <span class="p">[</span> <span class="mf">73.018533159</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR12</span><span class="p">:</span> <span class="mi">0000000020004208</span> <span class="n">GPR28</span><span class="p">:</span> <span class="mi">0000000030767</span><span class="n">d38</span> |
| <span class="p">[</span> <span class="mf">73.018642725</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR13</span><span class="p">:</span> <span class="mi">0000000031</span><span class="n">c20000</span> <span class="n">GPR29</span><span class="p">:</span> <span class="mi">00000000300</span><span class="n">b3dc6</span> |
| <span class="p">[</span> <span class="mf">73.018737925</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR14</span><span class="p">:</span> <span class="mi">0000000000000000</span> <span class="n">GPR30</span><span class="p">:</span> <span class="mi">0000000000000010</span> |
| <span class="p">[</span> <span class="mf">73.018794428</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR15</span><span class="p">:</span> <span class="mi">0000000000000000</span> <span class="n">GPR31</span><span class="p">:</span> <span class="mi">7</span><span class="n">c7b1b7848002514</span> |
| </pre></div> |
| </div> |
| <p>This has been seen in the lab on a witherspoon using the device tree |
| entry point (ie. no HDAT).</p> |
| <p>This fixes the null pointer deref.</p> |
| </li> |
| </ul> |
| </section> |
| <section id="bugs-fixed"> |
| <h2>Bugs Fixed<a class="headerlink" href="#bugs-fixed" title="Link to this heading">¶</a></h2> |
| <p>Since <a class="reference internal" href="skiboot-5.11-rc1.html#skiboot-5-11-rc1"><span class="std std-ref">skiboot-5.11-rc1</span></a>:</p> |
| <ul> |
| <li><p>cpufeatures: Fix setting DARN and SCV HWCAP feature bits</p> |
| <p>DARN and SCV has been assigned AT_HWCAP2 (32-63) bits:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="c1">#define PPC_FEATURE2_DARN 0x00200000 /* darn random number insn */</span> |
| <span class="c1">#define PPC_FEATURE2_SCV 0x00100000 /* scv syscall */</span> |
| </pre></div> |
| </div> |
| <p>A cpufeatures-aware OS will not advertise these to userspace without |
| this patch.</p> |
| </li> |
| <li><p>xive: disable store EOI support</p> |
| <p>Hardware has limitations which would require to put a sync after each |
| store EOI to make sure the MMIO operations that change the ESB state |
| are ordered. This is a killer for performance and the PHBs do not |
| support the sync. So remove the store EOI for the moment, until |
| hardware is improved.</p> |
| <p>Also, while we are at changing the XIVE source flags, let’s fix the |
| settings for the PHB4s which should follow these rules :</p> |
| <ul class="simple"> |
| <li><p>SHIFT_BUG for DD10</p></li> |
| <li><p>STORE_EOI for DD20 and if enabled</p></li> |
| <li><p>TRIGGER_PAGE for DDx0 and if not STORE_EOI</p></li> |
| </ul> |
| </li> |
| </ul> |
| <p>Since <a class="reference internal" href="skiboot-5.10.html#skiboot-5-10"><span class="std std-ref">skiboot-5.10</span></a>:</p> |
| <ul> |
| <li><p>xive: fix opal_xive_set_vp_info() error path</p> |
| <p>In case of error, opal_xive_set_vp_info() will return without |
| unlocking the xive object. This is most certainly a typo.</p> |
| </li> |
| <li><p>hw/imc: don’t access homer memory if it was not initialised</p> |
| <p>This can happen under mambo, at least.</p> |
| </li> |
| <li><p>nvram: run nvram_validate() after nvram_reformat()</p> |
| <p>nvram_reformat() sets nvram_valid = true, but it does not set |
| skiboot_part_hdr. Call nvram_validate() instead, which sets |
| everything up properly.</p> |
| </li> |
| <li><p>dts: Zero struct to avoid using uninitialised value</p></li> |
| <li><p>hw/imc: Don’t dereference possible NULL</p></li> |
| <li><p>libstb/create-container: munmap() signature file address</p></li> |
| <li><p>npu2-opencapi: Fix memory leak</p></li> |
| <li><p>npu2: Fix possible NULL dereference</p></li> |
| <li><p>occ-sensors: Remove NULL checks after dereference</p></li> |
| <li><p>core/ipmi-opal: Add interrupt-parent property for ipmi node on P9 and above.</p> |
| <p>dtc complains below warning with newer 4.2+ kernels.</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">dts</span><span class="p">:</span> <span class="ne">Warning</span> <span class="p">(</span><span class="n">interrupts_property</span><span class="p">):</span> <span class="n">Missing</span> <span class="n">interrupt</span><span class="o">-</span><span class="n">parent</span> <span class="k">for</span> <span class="o">/</span><span class="n">ibm</span><span class="p">,</span><span class="n">opal</span><span class="o">/</span><span class="n">ipmi</span> |
| </pre></div> |
| </div> |
| <p>This fix adds interrupt-parent property under /ibm,opal/ipmi DT node on P9 |
| and above, which allows ipmi-opal to properly use the OPAL irqchip.</p> |
| </li> |
| </ul> |
| </section> |
| <section id="other-fixes-and-improvements"> |
| <h2>Other fixes and improvements<a class="headerlink" href="#other-fixes-and-improvements" title="Link to this heading">¶</a></h2> |
| <ul> |
| <li><p>core/cpu: discover stack region size before initialising memory regions</p> |
| <p>Stack allocation first allocates a memory region sized to hold stacks |
| for all possible CPUs up to the maximum PIR of the architecture, zeros |
| the region, then initialises all stacks. Max PIR is 32768 on POWER9, |
| which is 512MB for stacks.</p> |
| <p>The stack region is then shrunk after CPUs are discovered, but this is |
| a bit of a hack, and it leaves a hole in the memory allocation regions |
| as it’s done after mem regions are initialised.</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="mh">0x000000000000</span><span class="o">.</span><span class="mf">.00002</span><span class="n">fffffff</span> <span class="p">:</span> <span class="n">ibm</span><span class="p">,</span><span class="n">os</span><span class="o">-</span><span class="n">reserve</span> <span class="o">-</span> <span class="n">OS</span> |
| <span class="mh">0x000030000000</span><span class="o">.</span><span class="mf">.0000303</span><span class="n">fffff</span> <span class="p">:</span> <span class="n">ibm</span><span class="p">,</span><span class="n">firmware</span><span class="o">-</span><span class="n">code</span> <span class="o">-</span> <span class="n">OPAL</span> |
| <span class="mh">0x000030400000</span><span class="o">.</span><span class="mf">.000030</span><span class="n">ffffff</span> <span class="p">:</span> <span class="n">ibm</span><span class="p">,</span><span class="n">firmware</span><span class="o">-</span><span class="n">heap</span> <span class="o">-</span> <span class="n">OPAL</span> |
| <span class="mh">0x000031000000</span><span class="o">.</span><span class="mf">.000031</span><span class="n">bfffff</span> <span class="p">:</span> <span class="n">ibm</span><span class="p">,</span><span class="n">firmware</span><span class="o">-</span><span class="n">data</span> <span class="o">-</span> <span class="n">OPAL</span> |
| <span class="mh">0x000031c00000</span><span class="o">.</span><span class="mf">.000031</span><span class="n">c0ffff</span> <span class="p">:</span> <span class="n">ibm</span><span class="p">,</span><span class="n">firmware</span><span class="o">-</span><span class="n">stacks</span> <span class="o">-</span> <span class="n">OPAL</span> |
| <span class="o">***</span> <span class="n">gap</span> <span class="o">***</span> |
| <span class="mh">0x000051c00000</span><span class="o">.</span><span class="mf">.000051</span><span class="n">d01fff</span> <span class="p">:</span> <span class="n">ibm</span><span class="p">,</span><span class="n">firmware</span><span class="o">-</span><span class="n">allocs</span><span class="o">-</span><span class="n">memory</span><span class="o">@</span><span class="mi">0</span> <span class="o">-</span> <span class="n">OPAL</span> |
| <span class="mh">0x000051d02000</span><span class="o">.</span><span class="mf">.00007</span><span class="n">fffffff</span> <span class="p">:</span> <span class="n">ibm</span><span class="p">,</span><span class="n">firmware</span><span class="o">-</span><span class="n">allocs</span><span class="o">-</span><span class="n">memory</span><span class="o">@</span><span class="mi">0</span> <span class="o">-</span> <span class="n">OS</span> |
| <span class="mh">0x000080000000</span><span class="o">.</span><span class="mf">.000080</span><span class="n">b3cdff</span> <span class="p">:</span> <span class="n">initramfs</span> <span class="o">-</span> <span class="n">OPAL</span> |
| <span class="mh">0x000080b3ce00</span><span class="o">.</span><span class="mf">.000080</span><span class="n">b7cdff</span> <span class="p">:</span> <span class="n">ibm</span><span class="p">,</span><span class="n">fake</span><span class="o">-</span><span class="n">nvram</span> <span class="o">-</span> <span class="n">OPAL</span> |
| <span class="mh">0x000080b7ce00</span><span class="o">.</span><span class="mf">.0000</span><span class="n">ffffffff</span> <span class="p">:</span> <span class="n">ibm</span><span class="p">,</span><span class="n">firmware</span><span class="o">-</span><span class="n">allocs</span><span class="o">-</span><span class="n">memory</span><span class="o">@</span><span class="mi">0</span> <span class="o">-</span> <span class="n">OS</span> |
| </pre></div> |
| </div> |
| <p>This change moves zeroing into the per-cpu stack setup. The boot CPU |
| stack is set up based on the current PIR. Then the size of the stack |
| region is set, by discovering the maximum PIR of the system from the |
| device tree, before mem regions are intialised.</p> |
| <p>This results in all memory being accounted within memory regions, |
| and less memory fragmentation of OPAL allocations.</p> |
| </li> |
| <li><p>Make gard display show that a record is cleared</p> |
| <p>When clearing gard records, Hostboot only modifies the record_id |
| portion to be 0xFFFFFFFF. The remainder of the entry remains. |
| Without this change it can be confusing to users to know that |
| the record they are looking at is no longer valid.</p> |
| </li> |
| <li><p>Reserve OPAL API number for opal_handle_hmi2 function.</p></li> |
| <li><p>dts: spl_wakeup: Remove all workarounds in the spl wakeup logic</p> |
| <p>We coded few workarounds in special wakeup logic to handle the |
| buggy firmware. Now that is fixed remove them as they break the |
| special wakeup protocol. As per the spec we should not de-assert |
| beofre assert is complete. So follow this protocol.</p> |
| </li> |
| <li><p>build: use thin archives rather than incremental linking</p> |
| <p>This changes to build system to use thin archives rather than |
| incremental linking for built-in.o, similar to recent change to Linux. |
| built-in.o is renamed to built-in.a, and is created as a thin archive |
| with no index, for speed and size. All built-in.a are aggregated into |
| a skiboot.tmp.a which is a thin archive built with an index, making it |
| suitable or linking. This is input into the final link.</p> |
| <p>The advantags of build size and linker code placement flexibility are |
| not as great with skiboot as a bigger project like Linux, but it’s a |
| conceptually better way to build, and is more compatible with link |
| time optimisation in toolchains which might be interesting for skiboot |
| particularly for size reductions.</p> |
| <p>Size of build tree before this patch is 34.4MB, afterwards 23.1MB.</p> |
| </li> |
| <li><p>core/init: Assert when kernel not found</p> |
| <p>If the kernel doesn’t load out of flash or there is nothing at |
| KERNEL_LOAD_BASE, we end up with an esoteric message as we try to |
| branch to out of skiboot into nothing</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="p">[</span> <span class="mf">0.007197688</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">INIT</span><span class="p">:</span> <span class="n">ELF</span> <span class="n">header</span> <span class="ow">not</span> <span class="n">found</span><span class="o">.</span> <span class="n">Assuming</span> <span class="n">raw</span> <span class="n">binary</span><span class="o">.</span> |
| <span class="p">[</span> <span class="mf">0.014035267</span><span class="p">,</span><span class="mi">5</span><span class="p">]</span> <span class="n">INIT</span><span class="p">:</span> <span class="n">Starting</span> <span class="n">kernel</span> <span class="n">at</span> <span class="mh">0x0</span><span class="p">,</span> <span class="n">fdt</span> <span class="n">at</span> <span class="mh">0x3044ad90</span> <span class="mi">13029</span> |
| <span class="p">[</span> <span class="mf">0.014042254</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="o">***********************************************</span> |
| <span class="p">[</span> <span class="mf">0.014069947</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">Fatal</span> <span class="ne">Exception</span> <span class="mh">0xe40</span> <span class="n">at</span> <span class="mi">0000000000000000</span> |
| <span class="p">[</span> <span class="mf">0.014085574</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">CFAR</span> <span class="p">:</span> <span class="mi">00000000300051</span><span class="n">c4</span> |
| <span class="p">[</span> <span class="mf">0.014090118</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">SRR0</span> <span class="p">:</span> <span class="mi">0000000000000000</span> <span class="n">SRR1</span> <span class="p">:</span> <span class="mi">0000000000000000</span> |
| <span class="p">[</span> <span class="mf">0.014096243</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">HSRR0</span><span class="p">:</span> <span class="mi">0000000000000000</span> <span class="n">HSRR1</span><span class="p">:</span> <span class="mi">9000000000001000</span> |
| <span class="p">[</span> <span class="mf">0.014102546</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">DSISR</span><span class="p">:</span> <span class="mi">00000000</span> <span class="n">DAR</span> <span class="p">:</span> <span class="mi">0000000000000000</span> |
| <span class="p">[</span> <span class="mf">0.014108538</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">LR</span> <span class="p">:</span> <span class="mi">00000000300144</span><span class="n">c8</span> <span class="n">CTR</span> <span class="p">:</span> <span class="mi">0000000000000000</span> |
| <span class="p">[</span> <span class="mf">0.014114756</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">CR</span> <span class="p">:</span> <span class="mi">40002202</span> <span class="n">XER</span> <span class="p">:</span> <span class="mi">00000000</span> |
| <span class="p">[</span> <span class="mf">0.014120301</span><span class="p">,</span><span class="mi">3</span><span class="p">]</span> <span class="n">GPR00</span><span class="p">:</span> <span class="mi">000000003001447</span><span class="n">c</span> <span class="n">GPR16</span><span class="p">:</span> <span class="mi">0000000000000000</span> |
| </pre></div> |
| </div> |
| <p>This improves the message and asserts in this case:</p> |
| <div class="highlight-default notranslate"><div class="highlight"><pre><span></span>[ 0.014042685,5] INIT: Starting kernel at 0x0, fdt at 0x3044ad90 13049 bytes) |
| [ 0.014049556,0] FATAL: Kernel is zeros, can't execute! |
| [ 0.014054237,0] Assert fail: core/init.c:566:0 |
| [ 0.014060472,0] Aborting! |
| </pre></div> |
| </div> |
| </li> |
| <li><p>core: Fix ‘opal-runtime-size’ property</p> |
| <p>We are populating ‘opal-runtime-size’ before calculating actual stack size. |
| Hence we endup having wrong runtime size (ex: on P9 it shows ~540MB while |
| actual size is around ~40MB). Note that only device tree property is shows |
| wrong value, but reserved-memory reflects correct size.</p> |
| <p>init_all_cpus() calculates and updates actual stack size. Hence move this |
| function call before add_opal_node().</p> |
| </li> |
| <li><p>mambo: Add fw-feature flags for security related settings</p> |
| <p>Newer firmwares report some feature flags related to security |
| settings via HDAT. On real hardware skiboot translates these into |
| device tree properties. For testing purposes just create the |
| properties manually in the tcl.</p> |
| <p>These values don’t exactly match any actual chip revision, but the |
| code should not rely on any exact set of values anyway. We just define |
| the most interesting flags, that if toggled to “disable” will change |
| Linux behaviour. You can see the actual values in the hostboot source |
| in src/usr/hdat/hdatiplparms.H.</p> |
| <p>Also add an environment variable for easily toggling the top-level |
| “security on” setting.</p> |
| </li> |
| <li><p>direct-controls: mambo fix for multiple chips</p></li> |
| <li><p>libflash/blocklevel: Correct miscalculation in blocklevel_smart_erase()</p> |
| <p>If blocklevel_smart_erase() detects that the smart erase fits entire in |
| one erase block, it has an early bail path. In this path it miscaculates |
| where in the buffer the backend needs to read from to perform the final |
| write.</p> |
| </li> |
| <li><p>libstb/secureboot: Fix logging of secure verify messages.</p> |
| <p>Currently we are logging secure verify/enforce messages in PR_EMERG |
| level even when there is no secureboot mode enabled. So reduce the |
| log level to PR_ERR when secureboot mode is OFF.</p> |
| </li> |
| </ul> |
| </section> |
| <section id="testing-code-coverage-improvements"> |
| <h2>Testing / Code coverage improvements<a class="headerlink" href="#testing-code-coverage-improvements" title="Link to this heading">¶</a></h2> |
| <p>Improvements in gcov support include support for newer GCCs as well |
| as easily exporting the area of memory you need to dump to feed to |
| <cite>extract-gcov</cite>.</p> |
| <ul> |
| <li><p>cpu_idle_job: relax a bit</p> |
| <p>This <em>dramatically</em> improves kernel boot time with GCOV builds</p> |
| <p>from ~3minutes between loading kernel and switching the HILE |
| bit down to around 10 seconds.</p> |
| </li> |
| <li><p>gcov: Another GCC, another gcov tweak</p></li> |
| <li><p>Keep constructors with priorities</p> |
| <p>Fixes GCOV builds with gcc7, which uses this.</p> |
| </li> |
| <li><p>gcov: Add gcov data struct to sysfs</p> |
| <p>Extracting the skiboot gcov data is currently a tedious process which |
| involves taking a mem dump of skiboot and searching for the gcov_info |
| struct. |
| This patch adds the gcov struct to sysfs under /opal/exports. Allowing the |
| data to be copied directly into userspace and processed.</p> |
| </li> |
| </ul> |
| </section> |
| </section> |
| |
| |
| <div class="clearer"></div> |
| </div> |
| </div> |
| </div> |
| <div class="sphinxsidebar" role="navigation" aria-label="main navigation"> |
| <div class="sphinxsidebarwrapper"> |
| <div> |
| <h3><a href="../index.html">Table of Contents</a></h3> |
| <ul> |
| <li><a class="reference internal" href="#">skiboot-5.11</a><ul> |
| <li><a class="reference internal" href="#new-platforms">New Platforms</a></li> |
| <li><a class="reference internal" href="#new-features">New Features</a></li> |
| <li><a class="reference internal" href="#power-management">Power Management</a><ul> |
| <li><a class="reference internal" href="#mbox-based-platforms">mbox based platforms</a></li> |
| </ul> |
| </li> |
| <li><a class="reference internal" href="#fast-reboot-improvements">Fast Reboot Improvements</a></li> |
| <li><a class="reference internal" href="#debugging-sreset-improvemens">Debugging/SRESET improvemens</a></li> |
| <li><a class="reference internal" href="#npu2-nvlink2-fixes">NPU2/NVLink2 Fixes</a></li> |
| <li><a class="reference internal" href="#capi-opencapi">CAPI/OpenCAPI</a></li> |
| <li><a class="reference internal" href="#pci">PCI</a></li> |
| <li><a class="reference internal" href="#bugs-fixed">Bugs Fixed</a></li> |
| <li><a class="reference internal" href="#other-fixes-and-improvements">Other fixes and improvements</a></li> |
| <li><a class="reference internal" href="#testing-code-coverage-improvements">Testing / Code coverage improvements</a></li> |
| </ul> |
| </li> |
| </ul> |
| |
| </div> |
| <div> |
| <h4>Previous topic</h4> |
| <p class="topless"><a href="skiboot-5.10.6.html" |
| title="previous chapter">skiboot-5.10.6</a></p> |
| </div> |
| <div> |
| <h4>Next topic</h4> |
| <p class="topless"><a href="skiboot-5.11-rc1.html" |
| title="next chapter">skiboot-5.11-rc1</a></p> |
| </div> |
| <div role="note" aria-label="source link"> |
| <h3>This Page</h3> |
| <ul class="this-page-menu"> |
| <li><a href="../_sources/release-notes/skiboot-5.11.rst.txt" |
| rel="nofollow">Show Source</a></li> |
| </ul> |
| </div> |
| <div id="searchbox" style="display: none" role="search"> |
| <h3 id="searchlabel">Quick search</h3> |
| <div class="searchformwrapper"> |
| <form class="search" action="../search.html" method="get"> |
| <input type="text" name="q" aria-labelledby="searchlabel" autocomplete="off" autocorrect="off" autocapitalize="off" spellcheck="false"/> |
| <input type="submit" value="Go" /> |
| </form> |
| </div> |
| </div> |
| <script>document.getElementById('searchbox').style.display = "block"</script> |
| </div> |
| </div> |
| <div class="clearer"></div> |
| </div> |
| <div class="related" role="navigation" aria-label="related navigation"> |
| <h3>Navigation</h3> |
| <ul> |
| <li class="right" style="margin-right: 10px"> |
| <a href="../genindex.html" title="General Index" |
| >index</a></li> |
| <li class="right" > |
| <a href="skiboot-5.11-rc1.html" title="skiboot-5.11-rc1" |
| >next</a> |</li> |
| <li class="right" > |
| <a href="skiboot-5.10.6.html" title="skiboot-5.10.6" |
| >previous</a> |</li> |
| <li class="nav-item nav-item-0"><a href="../index.html">skiboot d365a01 |
| documentation</a> »</li> |
| <li class="nav-item nav-item-1"><a href="index.html" >Release Notes</a> »</li> |
| <li class="nav-item nav-item-this"><a href="">skiboot-5.11</a></li> |
| </ul> |
| </div> |
| <div class="footer" role="contentinfo"> |
| © Copyright 2016-2017, IBM, others. |
| Created using <a href="https://www.sphinx-doc.org/">Sphinx</a> 7.2.6. |
| </div> |
| </body> |
| </html> |