| .. | 
 |    Copyright (C) 2017, Emilio G. Cota <cota@braap.org> | 
 |    Copyright (c) 2019, Linaro Limited | 
 |    Written by Emilio Cota and Alex Bennée | 
 |  | 
 | .. _TCG Plugins: | 
 |  | 
 | QEMU TCG Plugins | 
 | ================ | 
 |  | 
 | QEMU TCG plugins provide a way for users to run experiments taking | 
 | advantage of the total system control emulation can have over a guest. | 
 | It provides a mechanism for plugins to subscribe to events during | 
 | translation and execution and optionally callback into the plugin | 
 | during these events. TCG plugins are unable to change the system state | 
 | only monitor it passively. However they can do this down to an | 
 | individual instruction granularity including potentially subscribing | 
 | to all load and store operations. | 
 |  | 
 | Usage | 
 | ----- | 
 |  | 
 | Any QEMU binary with TCG support has plugins enabled by default. | 
 | Earlier releases needed to be explicitly enabled with:: | 
 |  | 
 |   configure --enable-plugins | 
 |  | 
 | Once built a program can be run with multiple plugins loaded each with | 
 | their own arguments:: | 
 |  | 
 |   $QEMU $OTHER_QEMU_ARGS \ | 
 |       -plugin contrib/plugin/libhowvec.so,inline=on,count=hint \ | 
 |       -plugin contrib/plugin/libhotblocks.so | 
 |  | 
 | Arguments are plugin specific and can be used to modify their | 
 | behaviour. In this case the howvec plugin is being asked to use inline | 
 | ops to count and break down the hint instructions by type. | 
 |  | 
 | Linux user-mode emulation also evaluates the environment variable | 
 | ``QEMU_PLUGIN``:: | 
 |  | 
 |   QEMU_PLUGIN="file=contrib/plugins/libhowvec.so,inline=on,count=hint" $QEMU | 
 |  | 
 | Writing plugins | 
 | --------------- | 
 |  | 
 | API versioning | 
 | ~~~~~~~~~~~~~~ | 
 |  | 
 | This is a new feature for QEMU and it does allow people to develop | 
 | out-of-tree plugins that can be dynamically linked into a running QEMU | 
 | process. However the project reserves the right to change or break the | 
 | API should it need to do so. The best way to avoid this is to submit | 
 | your plugin upstream so they can be updated if/when the API changes. | 
 |  | 
 | All plugins need to declare a symbol which exports the plugin API | 
 | version they were built against. This can be done simply by:: | 
 |  | 
 |   QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION; | 
 |  | 
 | The core code will refuse to load a plugin that doesn't export a | 
 | ``qemu_plugin_version`` symbol or if plugin version is outside of QEMU's | 
 | supported range of API versions. | 
 |  | 
 | Additionally the ``qemu_info_t`` structure which is passed to the | 
 | ``qemu_plugin_install`` method of a plugin will detail the minimum and | 
 | current API versions supported by QEMU. The API version will be | 
 | incremented if new APIs are added. The minimum API version will be | 
 | incremented if existing APIs are changed or removed. | 
 |  | 
 | Lifetime of the query handle | 
 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
 |  | 
 | Each callback provides an opaque anonymous information handle which | 
 | can usually be further queried to find out information about a | 
 | translation, instruction or operation. The handles themselves are only | 
 | valid during the lifetime of the callback so it is important that any | 
 | information that is needed is extracted during the callback and saved | 
 | by the plugin. | 
 |  | 
 | Plugin life cycle | 
 | ~~~~~~~~~~~~~~~~~ | 
 |  | 
 | First the plugin is loaded and the public qemu_plugin_install function | 
 | is called. The plugin will then register callbacks for various plugin | 
 | events. Generally plugins will register a handler for the *atexit* | 
 | if they want to dump a summary of collected information once the | 
 | program/system has finished running. | 
 |  | 
 | When a registered event occurs the plugin callback is invoked. The | 
 | callbacks may provide additional information. In the case of a | 
 | translation event the plugin has an option to enumerate the | 
 | instructions in a block of instructions and optionally register | 
 | callbacks to some or all instructions when they are executed. | 
 |  | 
 | There is also a facility to add an inline event where code to | 
 | increment a counter can be directly inlined with the translation. | 
 | Currently only a simple increment is supported. This is not atomic so | 
 | can miss counts. If you want absolute precision you should use a | 
 | callback which can then ensure atomicity itself. | 
 |  | 
 | Finally when QEMU exits all the registered *atexit* callbacks are | 
 | invoked. | 
 |  | 
 | Exposure of QEMU internals | 
 | ~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
 |  | 
 | The plugin architecture actively avoids leaking implementation details | 
 | about how QEMU's translation works to the plugins. While there are | 
 | conceptions such as translation time and translation blocks the | 
 | details are opaque to plugins. The plugin is able to query select | 
 | details of instructions and system configuration only through the | 
 | exported *qemu_plugin* functions. | 
 |  | 
 | Internals | 
 | --------- | 
 |  | 
 | Locking | 
 | ~~~~~~~ | 
 |  | 
 | We have to ensure we cannot deadlock, particularly under MTTCG. For | 
 | this we acquire a lock when called from plugin code. We also keep the | 
 | list of callbacks under RCU so that we do not have to hold the lock | 
 | when calling the callbacks. This is also for performance, since some | 
 | callbacks (e.g. memory access callbacks) might be called very | 
 | frequently. | 
 |  | 
 |   * A consequence of this is that we keep our own list of CPUs, so that | 
 |     we do not have to worry about locking order wrt cpu_list_lock. | 
 |   * Use a recursive lock, since we can get registration calls from | 
 |     callbacks. | 
 |  | 
 | As a result registering/unregistering callbacks is "slow", since it | 
 | takes a lock. But this is very infrequent; we want performance when | 
 | calling (or not calling) callbacks, not when registering them. Using | 
 | RCU is great for this. | 
 |  | 
 | We support the uninstallation of a plugin at any time (e.g. from | 
 | plugin callbacks). This allows plugins to remove themselves if they no | 
 | longer want to instrument the code. This operation is asynchronous | 
 | which means callbacks may still occur after the uninstall operation is | 
 | requested. The plugin isn't completely uninstalled until the safe work | 
 | has executed while all vCPUs are quiescent. | 
 |  | 
 | Example Plugins | 
 | --------------- | 
 |  | 
 | There are a number of plugins included with QEMU and you are | 
 | encouraged to contribute your own plugins plugins upstream. There is a | 
 | ``contrib/plugins`` directory where they can go. There are also some | 
 | basic plugins that are used to test and exercise the API during the | 
 | ``make check-tcg`` target in ``tests\plugins``. | 
 |  | 
 | - tests/plugins/empty.c | 
 |  | 
 | Purely a test plugin for measuring the overhead of the plugins system | 
 | itself. Does no instrumentation. | 
 |  | 
 | - tests/plugins/bb.c | 
 |  | 
 | A very basic plugin which will measure execution in course terms as | 
 | each basic block is executed. By default the results are shown once | 
 | execution finishes:: | 
 |  | 
 |   $ qemu-aarch64 -plugin tests/plugin/libbb.so \ | 
 |       -d plugin ./tests/tcg/aarch64-linux-user/sha1 | 
 |   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 | 
 |   bb's: 2277338, insns: 158483046 | 
 |  | 
 | Behaviour can be tweaked with the following arguments: | 
 |  | 
 |  * inline=true|false | 
 |  | 
 |  Use faster inline addition of a single counter. Not per-cpu and not | 
 |  thread safe. | 
 |  | 
 |  * idle=true|false | 
 |  | 
 |  Dump the current execution stats whenever the guest vCPU idles | 
 |  | 
 | - tests/plugins/insn.c | 
 |  | 
 | This is a basic instruction level instrumentation which can count the | 
 | number of instructions executed on each core/thread:: | 
 |  | 
 |   $ qemu-aarch64 -plugin tests/plugin/libinsn.so \ | 
 |       -d plugin ./tests/tcg/aarch64-linux-user/threadcount | 
 |   Created 10 threads | 
 |   Done | 
 |   cpu 0 insns: 46765 | 
 |   cpu 1 insns: 3694 | 
 |   cpu 2 insns: 3694 | 
 |   cpu 3 insns: 2994 | 
 |   cpu 4 insns: 1497 | 
 |   cpu 5 insns: 1497 | 
 |   cpu 6 insns: 1497 | 
 |   cpu 7 insns: 1497 | 
 |   total insns: 63135 | 
 |  | 
 | Behaviour can be tweaked with the following arguments: | 
 |  | 
 |  * inline=true|false | 
 |  | 
 |  Use faster inline addition of a single counter. Not per-cpu and not | 
 |  thread safe. | 
 |  | 
 |  * sizes=true|false | 
 |  | 
 |  Give a summary of the instruction sizes for the execution | 
 |  | 
 |  * match=<string> | 
 |  | 
 |  Only instrument instructions matching the string prefix. Will show | 
 |  some basic stats including how many instructions have executed since | 
 |  the last execution. For example:: | 
 |  | 
 |    $ qemu-aarch64 -plugin tests/plugin/libinsn.so,match=bl \ | 
 |        -d plugin ./tests/tcg/aarch64-linux-user/sha512-vector | 
 |    ... | 
 |    0x40069c, 'bl #0x4002b0', 10 hits, 1093 match hits, Δ+1257 since last match, 98 avg insns/match | 
 |    0x4006ac, 'bl #0x403690', 10 hits, 1094 match hits, Δ+47 since last match, 98 avg insns/match  | 
 |    0x4037fc, 'bl #0x4002b0', 18 hits, 1095 match hits, Δ+22 since last match, 98 avg insns/match  | 
 |    0x400720, 'bl #0x403690', 10 hits, 1096 match hits, Δ+58 since last match, 98 avg insns/match  | 
 |    0x4037fc, 'bl #0x4002b0', 19 hits, 1097 match hits, Δ+22 since last match, 98 avg insns/match  | 
 |    0x400730, 'bl #0x403690', 10 hits, 1098 match hits, Δ+33 since last match, 98 avg insns/match  | 
 |    0x4037ac, 'bl #0x4002b0', 12 hits, 1099 match hits, Δ+20 since last match, 98 avg insns/match  | 
 |    ... | 
 |  | 
 | For more detailed execution tracing see the ``execlog`` plugin for | 
 | other options. | 
 |  | 
 | - tests/plugins/mem.c | 
 |  | 
 | Basic instruction level memory instrumentation:: | 
 |  | 
 |   $ qemu-aarch64 -plugin tests/plugin/libmem.so,inline=true \ | 
 |       -d plugin ./tests/tcg/aarch64-linux-user/sha1 | 
 |   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 | 
 |   inline mem accesses: 79525013 | 
 |  | 
 | Behaviour can be tweaked with the following arguments: | 
 |  | 
 |  * inline=true|false | 
 |  | 
 |  Use faster inline addition of a single counter. Not per-cpu and not | 
 |  thread safe. | 
 |  | 
 |  * callback=true|false | 
 |  | 
 |  Use callbacks on each memory instrumentation. | 
 |  | 
 |  * hwaddr=true|false | 
 |  | 
 |  Count IO accesses (only for system emulation) | 
 |  | 
 | - tests/plugins/syscall.c | 
 |  | 
 | A basic syscall tracing plugin. This only works for user-mode. By | 
 | default it will give a summary of syscall stats at the end of the | 
 | run:: | 
 |  | 
 |   $ qemu-aarch64 -plugin tests/plugin/libsyscall \ | 
 |       -d plugin ./tests/tcg/aarch64-linux-user/threadcount | 
 |   Created 10 threads | 
 |   Done | 
 |   syscall no.  calls  errors | 
 |   226          12     0 | 
 |   99           11     11 | 
 |   115          11     0 | 
 |   222          11     0 | 
 |   93           10     0 | 
 |   220          10     0 | 
 |   233          10     0 | 
 |   215          8      0 | 
 |   214          4      0 | 
 |   134          2      0 | 
 |   64           2      0 | 
 |   96           1      0 | 
 |   94           1      0 | 
 |   80           1      0 | 
 |   261          1      0 | 
 |   78           1      0 | 
 |   160          1      0 | 
 |   135          1      0 | 
 |  | 
 | - contrib/plugins/hotblocks.c | 
 |  | 
 | The hotblocks plugin allows you to examine the where hot paths of | 
 | execution are in your program. Once the program has finished you will | 
 | get a sorted list of blocks reporting the starting PC, translation | 
 | count, number of instructions and execution count. This will work best | 
 | with linux-user execution as system emulation tends to generate | 
 | re-translations as blocks from different programs get swapped in and | 
 | out of system memory. | 
 |  | 
 | If your program is single-threaded you can use the ``inline`` option for | 
 | slightly faster (but not thread safe) counters. | 
 |  | 
 | Example:: | 
 |  | 
 |   $ qemu-aarch64 \ | 
 |     -plugin contrib/plugins/libhotblocks.so -d plugin \ | 
 |     ./tests/tcg/aarch64-linux-user/sha1 | 
 |   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 | 
 |   collected 903 entries in the hash table | 
 |   pc, tcount, icount, ecount | 
 |   0x0000000041ed10, 1, 5, 66087 | 
 |   0x000000004002b0, 1, 4, 66087 | 
 |   ... | 
 |  | 
 | - contrib/plugins/hotpages.c | 
 |  | 
 | Similar to hotblocks but this time tracks memory accesses:: | 
 |  | 
 |   $ qemu-aarch64 \ | 
 |     -plugin contrib/plugins/libhotpages.so -d plugin \ | 
 |     ./tests/tcg/aarch64-linux-user/sha1 | 
 |   SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 | 
 |   Addr, RCPUs, Reads, WCPUs, Writes | 
 |   0x000055007fe000, 0x0001, 31747952, 0x0001, 8835161 | 
 |   0x000055007ff000, 0x0001, 29001054, 0x0001, 8780625 | 
 |   0x00005500800000, 0x0001, 687465, 0x0001, 335857 | 
 |   0x0000000048b000, 0x0001, 130594, 0x0001, 355 | 
 |   0x0000000048a000, 0x0001, 1826, 0x0001, 11 | 
 |  | 
 | The hotpages plugin can be configured using the following arguments: | 
 |  | 
 |   * sortby=reads|writes|address | 
 |  | 
 |   Log the data sorted by either the number of reads, the number of writes, or | 
 |   memory address. (Default: entries are sorted by the sum of reads and writes) | 
 |  | 
 |   * io=on | 
 |  | 
 |   Track IO addresses. Only relevant to full system emulation. (Default: off) | 
 |  | 
 |   * pagesize=N | 
 |  | 
 |   The page size used. (Default: N = 4096) | 
 |  | 
 | - contrib/plugins/howvec.c | 
 |  | 
 | This is an instruction classifier so can be used to count different | 
 | types of instructions. It has a number of options to refine which get | 
 | counted. You can give a value to the ``count`` argument for a class of | 
 | instructions to break it down fully, so for example to see all the system | 
 | registers accesses:: | 
 |  | 
 |   $ qemu-system-aarch64 $(QEMU_ARGS) \ | 
 |     -append "root=/dev/sda2 systemd.unit=benchmark.service" \ | 
 |     -smp 4 -plugin ./contrib/plugins/libhowvec.so,count=sreg -d plugin | 
 |  | 
 | which will lead to a sorted list after the class breakdown:: | 
 |  | 
 |   Instruction Classes: | 
 |   Class:   UDEF                   not counted | 
 |   Class:   SVE                    (68 hits) | 
 |   Class:   PCrel addr             (47789483 hits) | 
 |   Class:   Add/Sub (imm)          (192817388 hits) | 
 |   Class:   Logical (imm)          (93852565 hits) | 
 |   Class:   Move Wide (imm)        (76398116 hits) | 
 |   Class:   Bitfield               (44706084 hits) | 
 |   Class:   Extract                (5499257 hits) | 
 |   Class:   Cond Branch (imm)      (147202932 hits) | 
 |   Class:   Exception Gen          (193581 hits) | 
 |   Class:     NOP                  not counted | 
 |   Class:   Hints                  (6652291 hits) | 
 |   Class:   Barriers               (8001661 hits) | 
 |   Class:   PSTATE                 (1801695 hits) | 
 |   Class:   System Insn            (6385349 hits) | 
 |   Class:   System Reg             counted individually | 
 |   Class:   Branch (reg)           (69497127 hits) | 
 |   Class:   Branch (imm)           (84393665 hits) | 
 |   Class:   Cmp & Branch           (110929659 hits) | 
 |   Class:   Tst & Branch           (44681442 hits) | 
 |   Class:   AdvSimd ldstmult       (736 hits) | 
 |   Class:   ldst excl              (9098783 hits) | 
 |   Class:   Load Reg (lit)         (87189424 hits) | 
 |   Class:   ldst noalloc pair      (3264433 hits) | 
 |   Class:   ldst pair              (412526434 hits) | 
 |   Class:   ldst reg (imm)         (314734576 hits) | 
 |   Class: Loads & Stores           (2117774 hits) | 
 |   Class: Data Proc Reg            (223519077 hits) | 
 |   Class: Scalar FP                (31657954 hits) | 
 |   Individual Instructions: | 
 |   Instr: mrs x0, sp_el0           (2682661 hits)  (op=0xd5384100/  System Reg) | 
 |   Instr: mrs x1, tpidr_el2        (1789339 hits)  (op=0xd53cd041/  System Reg) | 
 |   Instr: mrs x2, tpidr_el2        (1513494 hits)  (op=0xd53cd042/  System Reg) | 
 |   Instr: mrs x0, tpidr_el2        (1490823 hits)  (op=0xd53cd040/  System Reg) | 
 |   Instr: mrs x1, sp_el0           (933793 hits)   (op=0xd5384101/  System Reg) | 
 |   Instr: mrs x2, sp_el0           (699516 hits)   (op=0xd5384102/  System Reg) | 
 |   Instr: mrs x4, tpidr_el2        (528437 hits)   (op=0xd53cd044/  System Reg) | 
 |   Instr: mrs x30, ttbr1_el1       (480776 hits)   (op=0xd538203e/  System Reg) | 
 |   Instr: msr ttbr1_el1, x30       (480713 hits)   (op=0xd518203e/  System Reg) | 
 |   Instr: msr vbar_el1, x30        (480671 hits)   (op=0xd518c01e/  System Reg) | 
 |   ... | 
 |  | 
 | To find the argument shorthand for the class you need to examine the | 
 | source code of the plugin at the moment, specifically the ``*opt`` | 
 | argument in the InsnClassExecCount tables. | 
 |  | 
 | - contrib/plugins/lockstep.c | 
 |  | 
 | This is a debugging tool for developers who want to find out when and | 
 | where execution diverges after a subtle change to TCG code generation. | 
 | It is not an exact science and results are likely to be mixed once | 
 | asynchronous events are introduced. While the use of -icount can | 
 | introduce determinism to the execution flow it doesn't always follow | 
 | the translation sequence will be exactly the same. Typically this is | 
 | caused by a timer firing to service the GUI causing a block to end | 
 | early. However in some cases it has proved to be useful in pointing | 
 | people at roughly where execution diverges. The only argument you need | 
 | for the plugin is a path for the socket the two instances will | 
 | communicate over:: | 
 |  | 
 |  | 
 |   $ qemu-system-sparc -monitor none -parallel none \ | 
 |     -net none -M SS-20 -m 256 -kernel day11/zImage.elf \ | 
 |     -plugin ./contrib/plugins/liblockstep.so,sockpath=lockstep-sparc.sock \ | 
 |     -d plugin,nochain | 
 |  | 
 | which will eventually report:: | 
 |  | 
 |   qemu-system-sparc: warning: nic lance.0 has no peer | 
 |   @ 0x000000ffd06678 vs 0x000000ffd001e0 (2/1 since last) | 
 |   @ 0x000000ffd07d9c vs 0x000000ffd06678 (3/1 since last) | 
 |   Δ insn_count @ 0x000000ffd07d9c (809900609) vs 0x000000ffd06678 (809900612) | 
 |     previously @ 0x000000ffd06678/10 (809900609 insns) | 
 |     previously @ 0x000000ffd001e0/4 (809900599 insns) | 
 |     previously @ 0x000000ffd080ac/2 (809900595 insns) | 
 |     previously @ 0x000000ffd08098/5 (809900593 insns) | 
 |     previously @ 0x000000ffd080c0/1 (809900588 insns) | 
 |  | 
 | - contrib/plugins/hwprofile.c | 
 |  | 
 | The hwprofile tool can only be used with system emulation and allows | 
 | the user to see what hardware is accessed how often. It has a number of options: | 
 |  | 
 |  * track=read or track=write | 
 |  | 
 |  By default the plugin tracks both reads and writes. You can use one | 
 |  of these options to limit the tracking to just one class of accesses. | 
 |  | 
 |  * source | 
 |  | 
 |  Will include a detailed break down of what the guest PC that made the | 
 |  access was. Not compatible with the pattern option. Example output:: | 
 |  | 
 |    cirrus-low-memory @ 0xfffffd00000a0000 | 
 |     pc:fffffc0000005cdc, 1, 256 | 
 |     pc:fffffc0000005ce8, 1, 256 | 
 |     pc:fffffc0000005cec, 1, 256 | 
 |  | 
 |  * pattern | 
 |  | 
 |  Instead break down the accesses based on the offset into the HW | 
 |  region. This can be useful for seeing the most used registers of a | 
 |  device. Example output:: | 
 |  | 
 |     pci0-conf @ 0xfffffd01fe000000 | 
 |       off:00000004, 1, 1 | 
 |       off:00000010, 1, 3 | 
 |       off:00000014, 1, 3 | 
 |       off:00000018, 1, 2 | 
 |       off:0000001c, 1, 2 | 
 |       off:00000020, 1, 2 | 
 |       ... | 
 |  | 
 | - contrib/plugins/execlog.c | 
 |  | 
 | The execlog tool traces executed instructions with memory access. It can be used | 
 | for debugging and security analysis purposes. | 
 | Please be aware that this will generate a lot of output. | 
 |  | 
 | The plugin needs default argument:: | 
 |  | 
 |   $ qemu-system-arm $(QEMU_ARGS) \ | 
 |     -plugin ./contrib/plugins/libexeclog.so -d plugin | 
 |  | 
 | which will output an execution trace following this structure:: | 
 |  | 
 |   # vCPU, vAddr, opcode, disassembly[, load/store, memory addr, device]... | 
 |   0, 0xa12, 0xf8012400, "movs r4, #0" | 
 |   0, 0xa14, 0xf87f42b4, "cmp r4, r6" | 
 |   0, 0xa16, 0xd206, "bhs #0xa26" | 
 |   0, 0xa18, 0xfff94803, "ldr r0, [pc, #0xc]", load, 0x00010a28, RAM | 
 |   0, 0xa1a, 0xf989f000, "bl #0xd30" | 
 |   0, 0xd30, 0xfff9b510, "push {r4, lr}", store, 0x20003ee0, RAM, store, 0x20003ee4, RAM | 
 |   0, 0xd32, 0xf9893014, "adds r0, #0x14" | 
 |   0, 0xd34, 0xf9c8f000, "bl #0x10c8" | 
 |   0, 0x10c8, 0xfff96c43, "ldr r3, [r0, #0x44]", load, 0x200000e4, RAM | 
 |  | 
 | the output can be filtered to only track certain instructions or | 
 | addresses using the ``ifilter`` or ``afilter`` options. You can stack the | 
 | arguments if required:: | 
 |  | 
 |   $ qemu-system-arm $(QEMU_ARGS) \ | 
 |     -plugin ./contrib/plugins/libexeclog.so,ifilter=st1w,afilter=0x40001808 -d plugin | 
 |  | 
 | - contrib/plugins/cache.c | 
 |  | 
 | Cache modelling plugin that measures the performance of a given L1 cache | 
 | configuration, and optionally a unified L2 per-core cache when a given working | 
 | set is run:: | 
 |  | 
 |   $ qemu-x86_64 -plugin ./contrib/plugins/libcache.so \ | 
 |       -d plugin -D cache.log ./tests/tcg/x86_64-linux-user/float_convs | 
 |  | 
 | will report the following:: | 
 |  | 
 |     core #, data accesses, data misses, dmiss rate, insn accesses, insn misses, imiss rate | 
 |     0       996695         508             0.0510%  2642799        18617           0.7044% | 
 |  | 
 |     address, data misses, instruction | 
 |     0x424f1e (_int_malloc), 109, movq %rax, 8(%rcx) | 
 |     0x41f395 (_IO_default_xsputn), 49, movb %dl, (%rdi, %rax) | 
 |     0x42584d (ptmalloc_init.part.0), 33, movaps %xmm0, (%rax) | 
 |     0x454d48 (__tunables_init), 20, cmpb $0, (%r8) | 
 |     ... | 
 |  | 
 |     address, fetch misses, instruction | 
 |     0x4160a0 (__vfprintf_internal), 744, movl $1, %ebx | 
 |     0x41f0a0 (_IO_setb), 744, endbr64 | 
 |     0x415882 (__vfprintf_internal), 744, movq %r12, %rdi | 
 |     0x4268a0 (__malloc), 696, andq $0xfffffffffffffff0, %rax | 
 |     ... | 
 |  | 
 | The plugin has a number of arguments, all of them are optional: | 
 |  | 
 |   * limit=N | 
 |  | 
 |   Print top N icache and dcache thrashing instructions along with their | 
 |   address, number of misses, and its disassembly. (default: 32) | 
 |  | 
 |   * icachesize=N | 
 |   * iblksize=B | 
 |   * iassoc=A | 
 |  | 
 |   Instruction cache configuration arguments. They specify the cache size, block | 
 |   size, and associativity of the instruction cache, respectively. | 
 |   (default: N = 16384, B = 64, A = 8) | 
 |  | 
 |   * dcachesize=N | 
 |   * dblksize=B | 
 |   * dassoc=A | 
 |  | 
 |   Data cache configuration arguments. They specify the cache size, block size, | 
 |   and associativity of the data cache, respectively. | 
 |   (default: N = 16384, B = 64, A = 8) | 
 |  | 
 |   * evict=POLICY | 
 |  | 
 |   Sets the eviction policy to POLICY. Available policies are: :code:`lru`, | 
 |   :code:`fifo`, and :code:`rand`. The plugin will use the specified policy for | 
 |   both instruction and data caches. (default: POLICY = :code:`lru`) | 
 |  | 
 |   * cores=N | 
 |  | 
 |   Sets the number of cores for which we maintain separate icache and dcache. | 
 |   (default: for linux-user, N = 1, for full system emulation: N = cores | 
 |   available to guest) | 
 |  | 
 |   * l2=on | 
 |  | 
 |   Simulates a unified L2 cache (stores blocks for both instructions and data) | 
 |   using the default L2 configuration (cache size = 2MB, associativity = 16-way, | 
 |   block size = 64B). | 
 |  | 
 |   * l2cachesize=N | 
 |   * l2blksize=B | 
 |   * l2assoc=A | 
 |  | 
 |   L2 cache configuration arguments. They specify the cache size, block size, and | 
 |   associativity of the L2 cache, respectively. Setting any of the L2 | 
 |   configuration arguments implies ``l2=on``. | 
 |   (default: N = 2097152 (2MB), B = 64, A = 16) | 
 |  | 
 | API | 
 | --- | 
 |  | 
 | The following API is generated from the inline documentation in | 
 | ``include/qemu/qemu-plugin.h``. Please ensure any updates to the API | 
 | include the full kernel-doc annotations. | 
 |  | 
 | .. kernel-doc:: include/qemu/qemu-plugin.h | 
 |  |