Alex Bennée | 027e333 | 2019-06-10 16:10:02 +0100 | [diff] [blame] | 1 | .. |
| 2 | Copyright (C) 2017, Emilio G. Cota <cota@braap.org> |
| 3 | Copyright (c) 2019, Linaro Limited |
| 4 | Written by Emilio Cota and Alex Bennée |
| 5 | |
Alex Bennée | 027e333 | 2019-06-10 16:10:02 +0100 | [diff] [blame] | 6 | QEMU TCG Plugins |
| 7 | ================ |
| 8 | |
| 9 | QEMU TCG plugins provide a way for users to run experiments taking |
| 10 | advantage of the total system control emulation can have over a guest. |
| 11 | It provides a mechanism for plugins to subscribe to events during |
| 12 | translation and execution and optionally callback into the plugin |
| 13 | during these events. TCG plugins are unable to change the system state |
| 14 | only monitor it passively. However they can do this down to an |
| 15 | individual instruction granularity including potentially subscribing |
| 16 | to all load and store operations. |
| 17 | |
Alex Bennée | 027e333 | 2019-06-10 16:10:02 +0100 | [diff] [blame] | 18 | Usage |
Paolo Bonzini | e9adb4a | 2021-09-07 17:06:07 +0200 | [diff] [blame] | 19 | ----- |
Alex Bennée | 027e333 | 2019-06-10 16:10:02 +0100 | [diff] [blame] | 20 | |
Alex Bennée | ba4dd2a | 2021-07-09 15:29:57 +0100 | [diff] [blame] | 21 | Any QEMU binary with TCG support has plugins enabled by default. |
| 22 | Earlier releases needed to be explicitly enabled with:: |
Alex Bennée | 027e333 | 2019-06-10 16:10:02 +0100 | [diff] [blame] | 23 | |
Alex Bennée | 5c6ecbd | 2019-11-12 20:16:33 +0000 | [diff] [blame] | 24 | configure --enable-plugins |
Alex Bennée | 027e333 | 2019-06-10 16:10:02 +0100 | [diff] [blame] | 25 | |
| 26 | Once built a program can be run with multiple plugins loaded each with |
Alex Bennée | 5c6ecbd | 2019-11-12 20:16:33 +0000 | [diff] [blame] | 27 | their own arguments:: |
Alex Bennée | 027e333 | 2019-06-10 16:10:02 +0100 | [diff] [blame] | 28 | |
Alex Bennée | 5c6ecbd | 2019-11-12 20:16:33 +0000 | [diff] [blame] | 29 | $QEMU $OTHER_QEMU_ARGS \ |
Mahmoud Mandour | d852535 | 2021-07-30 15:58:11 +0200 | [diff] [blame] | 30 | -plugin tests/plugin/libhowvec.so,inline=on,count=hint \ |
Alex Bennée | 027e333 | 2019-06-10 16:10:02 +0100 | [diff] [blame] | 31 | -plugin tests/plugin/libhotblocks.so |
| 32 | |
| 33 | Arguments are plugin specific and can be used to modify their |
| 34 | behaviour. In this case the howvec plugin is being asked to use inline |
| 35 | ops to count and break down the hint instructions by type. |
| 36 | |
Paolo Bonzini | e9adb4a | 2021-09-07 17:06:07 +0200 | [diff] [blame] | 37 | Writing plugins |
| 38 | --------------- |
| 39 | |
| 40 | API versioning |
| 41 | ~~~~~~~~~~~~~~ |
| 42 | |
| 43 | This is a new feature for QEMU and it does allow people to develop |
| 44 | out-of-tree plugins that can be dynamically linked into a running QEMU |
| 45 | process. However the project reserves the right to change or break the |
| 46 | API should it need to do so. The best way to avoid this is to submit |
| 47 | your plugin upstream so they can be updated if/when the API changes. |
| 48 | |
| 49 | All plugins need to declare a symbol which exports the plugin API |
| 50 | version they were built against. This can be done simply by:: |
| 51 | |
| 52 | QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION; |
| 53 | |
| 54 | The core code will refuse to load a plugin that doesn't export a |
| 55 | ``qemu_plugin_version`` symbol or if plugin version is outside of QEMU's |
| 56 | supported range of API versions. |
| 57 | |
| 58 | Additionally the ``qemu_info_t`` structure which is passed to the |
| 59 | ``qemu_plugin_install`` method of a plugin will detail the minimum and |
| 60 | current API versions supported by QEMU. The API version will be |
| 61 | incremented if new APIs are added. The minimum API version will be |
| 62 | incremented if existing APIs are changed or removed. |
| 63 | |
| 64 | Lifetime of the query handle |
| 65 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 66 | |
| 67 | Each callback provides an opaque anonymous information handle which |
| 68 | can usually be further queried to find out information about a |
| 69 | translation, instruction or operation. The handles themselves are only |
| 70 | valid during the lifetime of the callback so it is important that any |
| 71 | information that is needed is extracted during the callback and saved |
| 72 | by the plugin. |
| 73 | |
| 74 | Plugin life cycle |
| 75 | ~~~~~~~~~~~~~~~~~ |
Alex Bennée | 027e333 | 2019-06-10 16:10:02 +0100 | [diff] [blame] | 76 | |
| 77 | First the plugin is loaded and the public qemu_plugin_install function |
| 78 | is called. The plugin will then register callbacks for various plugin |
| 79 | events. Generally plugins will register a handler for the *atexit* |
| 80 | if they want to dump a summary of collected information once the |
| 81 | program/system has finished running. |
| 82 | |
| 83 | When a registered event occurs the plugin callback is invoked. The |
| 84 | callbacks may provide additional information. In the case of a |
| 85 | translation event the plugin has an option to enumerate the |
| 86 | instructions in a block of instructions and optionally register |
| 87 | callbacks to some or all instructions when they are executed. |
| 88 | |
| 89 | There is also a facility to add an inline event where code to |
| 90 | increment a counter can be directly inlined with the translation. |
| 91 | Currently only a simple increment is supported. This is not atomic so |
| 92 | can miss counts. If you want absolute precision you should use a |
| 93 | callback which can then ensure atomicity itself. |
| 94 | |
| 95 | Finally when QEMU exits all the registered *atexit* callbacks are |
| 96 | invoked. |
| 97 | |
Paolo Bonzini | e9adb4a | 2021-09-07 17:06:07 +0200 | [diff] [blame] | 98 | Exposure of QEMU internals |
| 99 | ~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 100 | |
| 101 | The plugin architecture actively avoids leaking implementation details |
| 102 | about how QEMU's translation works to the plugins. While there are |
| 103 | conceptions such as translation time and translation blocks the |
| 104 | details are opaque to plugins. The plugin is able to query select |
| 105 | details of instructions and system configuration only through the |
| 106 | exported *qemu_plugin* functions. |
| 107 | |
| 108 | API |
| 109 | ~~~ |
| 110 | |
| 111 | .. kernel-doc:: include/qemu/qemu-plugin.h |
| 112 | |
Alex Bennée | 027e333 | 2019-06-10 16:10:02 +0100 | [diff] [blame] | 113 | Internals |
Paolo Bonzini | e9adb4a | 2021-09-07 17:06:07 +0200 | [diff] [blame] | 114 | --------- |
Alex Bennée | 027e333 | 2019-06-10 16:10:02 +0100 | [diff] [blame] | 115 | |
| 116 | Locking |
Paolo Bonzini | e9adb4a | 2021-09-07 17:06:07 +0200 | [diff] [blame] | 117 | ~~~~~~~ |
Alex Bennée | 027e333 | 2019-06-10 16:10:02 +0100 | [diff] [blame] | 118 | |
| 119 | We have to ensure we cannot deadlock, particularly under MTTCG. For |
| 120 | this we acquire a lock when called from plugin code. We also keep the |
| 121 | list of callbacks under RCU so that we do not have to hold the lock |
| 122 | when calling the callbacks. This is also for performance, since some |
| 123 | callbacks (e.g. memory access callbacks) might be called very |
| 124 | frequently. |
| 125 | |
| 126 | * A consequence of this is that we keep our own list of CPUs, so that |
| 127 | we do not have to worry about locking order wrt cpu_list_lock. |
| 128 | * Use a recursive lock, since we can get registration calls from |
| 129 | callbacks. |
| 130 | |
| 131 | As a result registering/unregistering callbacks is "slow", since it |
| 132 | takes a lock. But this is very infrequent; we want performance when |
| 133 | calling (or not calling) callbacks, not when registering them. Using |
| 134 | RCU is great for this. |
| 135 | |
| 136 | We support the uninstallation of a plugin at any time (e.g. from |
| 137 | plugin callbacks). This allows plugins to remove themselves if they no |
| 138 | longer want to instrument the code. This operation is asynchronous |
| 139 | which means callbacks may still occur after the uninstall operation is |
| 140 | requested. The plugin isn't completely uninstalled until the safe work |
| 141 | has executed while all vCPUs are quiescent. |
Alex Bennée | c17a386 | 2020-09-09 12:27:41 +0100 | [diff] [blame] | 142 | |
| 143 | Example Plugins |
Paolo Bonzini | e9adb4a | 2021-09-07 17:06:07 +0200 | [diff] [blame] | 144 | --------------- |
Alex Bennée | c17a386 | 2020-09-09 12:27:41 +0100 | [diff] [blame] | 145 | |
| 146 | There are a number of plugins included with QEMU and you are |
| 147 | encouraged to contribute your own plugins plugins upstream. There is a |
Peter Maydell | 1e235ed | 2021-07-26 15:23:33 +0100 | [diff] [blame] | 148 | ``contrib/plugins`` directory where they can go. |
Alex Bennée | c17a386 | 2020-09-09 12:27:41 +0100 | [diff] [blame] | 149 | |
| 150 | - tests/plugins |
| 151 | |
| 152 | These are some basic plugins that are used to test and exercise the |
Peter Maydell | 1e235ed | 2021-07-26 15:23:33 +0100 | [diff] [blame] | 153 | API during the ``make check-tcg`` target. |
Alex Bennée | c17a386 | 2020-09-09 12:27:41 +0100 | [diff] [blame] | 154 | |
| 155 | - contrib/plugins/hotblocks.c |
| 156 | |
| 157 | The hotblocks plugin allows you to examine the where hot paths of |
| 158 | execution are in your program. Once the program has finished you will |
| 159 | get a sorted list of blocks reporting the starting PC, translation |
| 160 | count, number of instructions and execution count. This will work best |
| 161 | with linux-user execution as system emulation tends to generate |
| 162 | re-translations as blocks from different programs get swapped in and |
| 163 | out of system memory. |
| 164 | |
Peter Maydell | 1e235ed | 2021-07-26 15:23:33 +0100 | [diff] [blame] | 165 | If your program is single-threaded you can use the ``inline`` option for |
Alex Bennée | c17a386 | 2020-09-09 12:27:41 +0100 | [diff] [blame] | 166 | slightly faster (but not thread safe) counters. |
| 167 | |
| 168 | Example:: |
| 169 | |
| 170 | ./aarch64-linux-user/qemu-aarch64 \ |
| 171 | -plugin contrib/plugins/libhotblocks.so -d plugin \ |
| 172 | ./tests/tcg/aarch64-linux-user/sha1 |
| 173 | SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 |
| 174 | collected 903 entries in the hash table |
| 175 | pc, tcount, icount, ecount |
| 176 | 0x0000000041ed10, 1, 5, 66087 |
| 177 | 0x000000004002b0, 1, 4, 66087 |
| 178 | ... |
| 179 | |
| 180 | - contrib/plugins/hotpages.c |
| 181 | |
| 182 | Similar to hotblocks but this time tracks memory accesses:: |
| 183 | |
| 184 | ./aarch64-linux-user/qemu-aarch64 \ |
| 185 | -plugin contrib/plugins/libhotpages.so -d plugin \ |
| 186 | ./tests/tcg/aarch64-linux-user/sha1 |
| 187 | SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 |
| 188 | Addr, RCPUs, Reads, WCPUs, Writes |
| 189 | 0x000055007fe000, 0x0001, 31747952, 0x0001, 8835161 |
| 190 | 0x000055007ff000, 0x0001, 29001054, 0x0001, 8780625 |
| 191 | 0x00005500800000, 0x0001, 687465, 0x0001, 335857 |
| 192 | 0x0000000048b000, 0x0001, 130594, 0x0001, 355 |
| 193 | 0x0000000048a000, 0x0001, 1826, 0x0001, 11 |
| 194 | |
Mahmoud Mandour | f698d5e | 2021-07-30 15:58:07 +0200 | [diff] [blame] | 195 | The hotpages plugin can be configured using the following arguments: |
| 196 | |
| 197 | * sortby=reads|writes|address |
| 198 | |
| 199 | Log the data sorted by either the number of reads, the number of writes, or |
| 200 | memory address. (Default: entries are sorted by the sum of reads and writes) |
| 201 | |
| 202 | * io=on |
| 203 | |
| 204 | Track IO addresses. Only relevant to full system emulation. (Default: off) |
| 205 | |
| 206 | * pagesize=N |
| 207 | |
| 208 | The page size used. (Default: N = 4096) |
| 209 | |
Alex Bennée | c17a386 | 2020-09-09 12:27:41 +0100 | [diff] [blame] | 210 | - contrib/plugins/howvec.c |
| 211 | |
| 212 | This is an instruction classifier so can be used to count different |
| 213 | types of instructions. It has a number of options to refine which get |
John Snow | 450e0f2 | 2021-10-04 17:52:36 -0400 | [diff] [blame] | 214 | counted. You can give a value to the ``count`` argument for a class of |
Mahmoud Mandour | d852535 | 2021-07-30 15:58:11 +0200 | [diff] [blame] | 215 | instructions to break it down fully, so for example to see all the system |
| 216 | registers accesses:: |
Alex Bennée | c17a386 | 2020-09-09 12:27:41 +0100 | [diff] [blame] | 217 | |
| 218 | ./aarch64-softmmu/qemu-system-aarch64 $(QEMU_ARGS) \ |
| 219 | -append "root=/dev/sda2 systemd.unit=benchmark.service" \ |
Mahmoud Mandour | d852535 | 2021-07-30 15:58:11 +0200 | [diff] [blame] | 220 | -smp 4 -plugin ./contrib/plugins/libhowvec.so,count=sreg -d plugin |
Alex Bennée | c17a386 | 2020-09-09 12:27:41 +0100 | [diff] [blame] | 221 | |
| 222 | which will lead to a sorted list after the class breakdown:: |
| 223 | |
| 224 | Instruction Classes: |
| 225 | Class: UDEF not counted |
| 226 | Class: SVE (68 hits) |
| 227 | Class: PCrel addr (47789483 hits) |
| 228 | Class: Add/Sub (imm) (192817388 hits) |
| 229 | Class: Logical (imm) (93852565 hits) |
| 230 | Class: Move Wide (imm) (76398116 hits) |
| 231 | Class: Bitfield (44706084 hits) |
| 232 | Class: Extract (5499257 hits) |
| 233 | Class: Cond Branch (imm) (147202932 hits) |
| 234 | Class: Exception Gen (193581 hits) |
| 235 | Class: NOP not counted |
| 236 | Class: Hints (6652291 hits) |
| 237 | Class: Barriers (8001661 hits) |
| 238 | Class: PSTATE (1801695 hits) |
| 239 | Class: System Insn (6385349 hits) |
| 240 | Class: System Reg counted individually |
| 241 | Class: Branch (reg) (69497127 hits) |
| 242 | Class: Branch (imm) (84393665 hits) |
| 243 | Class: Cmp & Branch (110929659 hits) |
| 244 | Class: Tst & Branch (44681442 hits) |
| 245 | Class: AdvSimd ldstmult (736 hits) |
| 246 | Class: ldst excl (9098783 hits) |
| 247 | Class: Load Reg (lit) (87189424 hits) |
| 248 | Class: ldst noalloc pair (3264433 hits) |
| 249 | Class: ldst pair (412526434 hits) |
| 250 | Class: ldst reg (imm) (314734576 hits) |
| 251 | Class: Loads & Stores (2117774 hits) |
| 252 | Class: Data Proc Reg (223519077 hits) |
| 253 | Class: Scalar FP (31657954 hits) |
| 254 | Individual Instructions: |
| 255 | Instr: mrs x0, sp_el0 (2682661 hits) (op=0xd5384100/ System Reg) |
| 256 | Instr: mrs x1, tpidr_el2 (1789339 hits) (op=0xd53cd041/ System Reg) |
| 257 | Instr: mrs x2, tpidr_el2 (1513494 hits) (op=0xd53cd042/ System Reg) |
| 258 | Instr: mrs x0, tpidr_el2 (1490823 hits) (op=0xd53cd040/ System Reg) |
| 259 | Instr: mrs x1, sp_el0 (933793 hits) (op=0xd5384101/ System Reg) |
| 260 | Instr: mrs x2, sp_el0 (699516 hits) (op=0xd5384102/ System Reg) |
| 261 | Instr: mrs x4, tpidr_el2 (528437 hits) (op=0xd53cd044/ System Reg) |
| 262 | Instr: mrs x30, ttbr1_el1 (480776 hits) (op=0xd538203e/ System Reg) |
| 263 | Instr: msr ttbr1_el1, x30 (480713 hits) (op=0xd518203e/ System Reg) |
| 264 | Instr: msr vbar_el1, x30 (480671 hits) (op=0xd518c01e/ System Reg) |
| 265 | ... |
| 266 | |
| 267 | To find the argument shorthand for the class you need to examine the |
Peter Maydell | 1e235ed | 2021-07-26 15:23:33 +0100 | [diff] [blame] | 268 | source code of the plugin at the moment, specifically the ``*opt`` |
Alex Bennée | c17a386 | 2020-09-09 12:27:41 +0100 | [diff] [blame] | 269 | argument in the InsnClassExecCount tables. |
| 270 | |
| 271 | - contrib/plugins/lockstep.c |
| 272 | |
| 273 | This is a debugging tool for developers who want to find out when and |
| 274 | where execution diverges after a subtle change to TCG code generation. |
| 275 | It is not an exact science and results are likely to be mixed once |
| 276 | asynchronous events are introduced. While the use of -icount can |
| 277 | introduce determinism to the execution flow it doesn't always follow |
| 278 | the translation sequence will be exactly the same. Typically this is |
| 279 | caused by a timer firing to service the GUI causing a block to end |
| 280 | early. However in some cases it has proved to be useful in pointing |
| 281 | people at roughly where execution diverges. The only argument you need |
| 282 | for the plugin is a path for the socket the two instances will |
| 283 | communicate over:: |
| 284 | |
| 285 | |
| 286 | ./sparc-softmmu/qemu-system-sparc -monitor none -parallel none \ |
| 287 | -net none -M SS-20 -m 256 -kernel day11/zImage.elf \ |
Mahmoud Mandour | b18a0ca | 2021-07-30 15:58:09 +0200 | [diff] [blame] | 288 | -plugin ./contrib/plugins/liblockstep.so,sockpath=lockstep-sparc.sock \ |
Alex Bennée | c17a386 | 2020-09-09 12:27:41 +0100 | [diff] [blame] | 289 | -d plugin,nochain |
| 290 | |
| 291 | which will eventually report:: |
| 292 | |
| 293 | qemu-system-sparc: warning: nic lance.0 has no peer |
| 294 | @ 0x000000ffd06678 vs 0x000000ffd001e0 (2/1 since last) |
| 295 | @ 0x000000ffd07d9c vs 0x000000ffd06678 (3/1 since last) |
| 296 | Δ insn_count @ 0x000000ffd07d9c (809900609) vs 0x000000ffd06678 (809900612) |
| 297 | previously @ 0x000000ffd06678/10 (809900609 insns) |
| 298 | previously @ 0x000000ffd001e0/4 (809900599 insns) |
| 299 | previously @ 0x000000ffd080ac/2 (809900595 insns) |
| 300 | previously @ 0x000000ffd08098/5 (809900593 insns) |
| 301 | previously @ 0x000000ffd080c0/1 (809900588 insns) |
| 302 | |
Mahmoud Mandour | a35af83 | 2021-08-30 14:15:34 +0200 | [diff] [blame] | 303 | - contrib/plugins/hwprofile.c |
Alex Bennée | a622d64 | 2021-02-13 13:03:05 +0000 | [diff] [blame] | 304 | |
| 305 | The hwprofile tool can only be used with system emulation and allows |
| 306 | the user to see what hardware is accessed how often. It has a number of options: |
| 307 | |
Mahmoud Mandour | 6075384 | 2021-07-30 15:58:10 +0200 | [diff] [blame] | 308 | * track=read or track=write |
Alex Bennée | a622d64 | 2021-02-13 13:03:05 +0000 | [diff] [blame] | 309 | |
| 310 | By default the plugin tracks both reads and writes. You can use one |
| 311 | of these options to limit the tracking to just one class of accesses. |
| 312 | |
Mahmoud Mandour | 6075384 | 2021-07-30 15:58:10 +0200 | [diff] [blame] | 313 | * source |
Alex Bennée | a622d64 | 2021-02-13 13:03:05 +0000 | [diff] [blame] | 314 | |
| 315 | Will include a detailed break down of what the guest PC that made the |
Mahmoud Mandour | 6075384 | 2021-07-30 15:58:10 +0200 | [diff] [blame] | 316 | access was. Not compatible with the pattern option. Example output:: |
Alex Bennée | a622d64 | 2021-02-13 13:03:05 +0000 | [diff] [blame] | 317 | |
| 318 | cirrus-low-memory @ 0xfffffd00000a0000 |
| 319 | pc:fffffc0000005cdc, 1, 256 |
| 320 | pc:fffffc0000005ce8, 1, 256 |
| 321 | pc:fffffc0000005cec, 1, 256 |
| 322 | |
Mahmoud Mandour | 6075384 | 2021-07-30 15:58:10 +0200 | [diff] [blame] | 323 | * pattern |
Alex Bennée | a622d64 | 2021-02-13 13:03:05 +0000 | [diff] [blame] | 324 | |
| 325 | Instead break down the accesses based on the offset into the HW |
| 326 | region. This can be useful for seeing the most used registers of a |
| 327 | device. Example output:: |
| 328 | |
| 329 | pci0-conf @ 0xfffffd01fe000000 |
| 330 | off:00000004, 1, 1 |
| 331 | off:00000010, 1, 3 |
| 332 | off:00000014, 1, 3 |
| 333 | off:00000018, 1, 2 |
| 334 | off:0000001c, 1, 2 |
| 335 | off:00000020, 1, 2 |
| 336 | ... |
Alexandre Iooss | 307ce0a | 2021-07-09 15:30:00 +0100 | [diff] [blame] | 337 | |
| 338 | - contrib/plugins/execlog.c |
| 339 | |
| 340 | The execlog tool traces executed instructions with memory access. It can be used |
| 341 | for debugging and security analysis purposes. |
| 342 | Please be aware that this will generate a lot of output. |
| 343 | |
| 344 | The plugin takes no argument:: |
| 345 | |
| 346 | qemu-system-arm $(QEMU_ARGS) \ |
| 347 | -plugin ./contrib/plugins/libexeclog.so -d plugin |
| 348 | |
| 349 | which will output an execution trace following this structure:: |
| 350 | |
| 351 | # vCPU, vAddr, opcode, disassembly[, load/store, memory addr, device]... |
| 352 | 0, 0xa12, 0xf8012400, "movs r4, #0" |
| 353 | 0, 0xa14, 0xf87f42b4, "cmp r4, r6" |
| 354 | 0, 0xa16, 0xd206, "bhs #0xa26" |
| 355 | 0, 0xa18, 0xfff94803, "ldr r0, [pc, #0xc]", load, 0x00010a28, RAM |
| 356 | 0, 0xa1a, 0xf989f000, "bl #0xd30" |
| 357 | 0, 0xd30, 0xfff9b510, "push {r4, lr}", store, 0x20003ee0, RAM, store, 0x20003ee4, RAM |
| 358 | 0, 0xd32, 0xf9893014, "adds r0, #0x14" |
| 359 | 0, 0xd34, 0xf9c8f000, "bl #0x10c8" |
| 360 | 0, 0x10c8, 0xfff96c43, "ldr r3, [r0, #0x44]", load, 0x200000e4, RAM |
Mahmoud Mandour | 4c125f3 | 2021-07-09 15:30:04 +0100 | [diff] [blame] | 361 | |
Mahmoud Mandour | a35af83 | 2021-08-30 14:15:34 +0200 | [diff] [blame] | 362 | - contrib/plugins/cache.c |
Mahmoud Mandour | 4c125f3 | 2021-07-09 15:30:04 +0100 | [diff] [blame] | 363 | |
Mahmoud Mandour | b8312e0 | 2021-10-26 11:22:23 +0100 | [diff] [blame] | 364 | Cache modelling plugin that measures the performance of a given L1 cache |
| 365 | configuration, and optionally a unified L2 per-core cache when a given working |
| 366 | set is run:: |
Mahmoud Mandour | 4c125f3 | 2021-07-09 15:30:04 +0100 | [diff] [blame] | 367 | |
| 368 | qemu-x86_64 -plugin ./contrib/plugins/libcache.so \ |
| 369 | -d plugin -D cache.log ./tests/tcg/x86_64-linux-user/float_convs |
| 370 | |
| 371 | will report the following:: |
| 372 | |
Mahmoud Mandour | 5397acb | 2021-08-03 17:13:01 +0200 | [diff] [blame] | 373 | core #, data accesses, data misses, dmiss rate, insn accesses, insn misses, imiss rate |
| 374 | 0 996695 508 0.0510% 2642799 18617 0.7044% |
Mahmoud Mandour | 4c125f3 | 2021-07-09 15:30:04 +0100 | [diff] [blame] | 375 | |
| 376 | address, data misses, instruction |
| 377 | 0x424f1e (_int_malloc), 109, movq %rax, 8(%rcx) |
| 378 | 0x41f395 (_IO_default_xsputn), 49, movb %dl, (%rdi, %rax) |
| 379 | 0x42584d (ptmalloc_init.part.0), 33, movaps %xmm0, (%rax) |
| 380 | 0x454d48 (__tunables_init), 20, cmpb $0, (%r8) |
| 381 | ... |
| 382 | |
| 383 | address, fetch misses, instruction |
| 384 | 0x4160a0 (__vfprintf_internal), 744, movl $1, %ebx |
| 385 | 0x41f0a0 (_IO_setb), 744, endbr64 |
| 386 | 0x415882 (__vfprintf_internal), 744, movq %r12, %rdi |
| 387 | 0x4268a0 (__malloc), 696, andq $0xfffffffffffffff0, %rax |
| 388 | ... |
| 389 | |
| 390 | The plugin has a number of arguments, all of them are optional: |
| 391 | |
Mahmoud Mandour | 2dd3fef | 2021-07-30 15:58:12 +0200 | [diff] [blame] | 392 | * limit=N |
Mahmoud Mandour | 4c125f3 | 2021-07-09 15:30:04 +0100 | [diff] [blame] | 393 | |
| 394 | Print top N icache and dcache thrashing instructions along with their |
| 395 | address, number of misses, and its disassembly. (default: 32) |
| 396 | |
Mahmoud Mandour | 2dd3fef | 2021-07-30 15:58:12 +0200 | [diff] [blame] | 397 | * icachesize=N |
| 398 | * iblksize=B |
| 399 | * iassoc=A |
Mahmoud Mandour | 4c125f3 | 2021-07-09 15:30:04 +0100 | [diff] [blame] | 400 | |
| 401 | Instruction cache configuration arguments. They specify the cache size, block |
| 402 | size, and associativity of the instruction cache, respectively. |
| 403 | (default: N = 16384, B = 64, A = 8) |
| 404 | |
Mahmoud Mandour | 2dd3fef | 2021-07-30 15:58:12 +0200 | [diff] [blame] | 405 | * dcachesize=N |
| 406 | * dblksize=B |
| 407 | * dassoc=A |
Mahmoud Mandour | 4c125f3 | 2021-07-09 15:30:04 +0100 | [diff] [blame] | 408 | |
| 409 | Data cache configuration arguments. They specify the cache size, block size, |
| 410 | and associativity of the data cache, respectively. |
| 411 | (default: N = 16384, B = 64, A = 8) |
| 412 | |
Mahmoud Mandour | 2dd3fef | 2021-07-30 15:58:12 +0200 | [diff] [blame] | 413 | * evict=POLICY |
Mahmoud Mandour | 4c125f3 | 2021-07-09 15:30:04 +0100 | [diff] [blame] | 414 | |
| 415 | Sets the eviction policy to POLICY. Available policies are: :code:`lru`, |
| 416 | :code:`fifo`, and :code:`rand`. The plugin will use the specified policy for |
| 417 | both instruction and data caches. (default: POLICY = :code:`lru`) |
Mahmoud Mandour | 5397acb | 2021-08-03 17:13:01 +0200 | [diff] [blame] | 418 | |
Mahmoud Mandour | 2dd3fef | 2021-07-30 15:58:12 +0200 | [diff] [blame] | 419 | * cores=N |
Mahmoud Mandour | 5397acb | 2021-08-03 17:13:01 +0200 | [diff] [blame] | 420 | |
| 421 | Sets the number of cores for which we maintain separate icache and dcache. |
| 422 | (default: for linux-user, N = 1, for full system emulation: N = cores |
| 423 | available to guest) |
Mahmoud Mandour | b8312e0 | 2021-10-26 11:22:23 +0100 | [diff] [blame] | 424 | |
| 425 | * l2=on |
| 426 | |
| 427 | Simulates a unified L2 cache (stores blocks for both instructions and data) |
| 428 | using the default L2 configuration (cache size = 2MB, associativity = 16-way, |
| 429 | block size = 64B). |
| 430 | |
| 431 | * l2cachesize=N |
| 432 | * l2blksize=B |
| 433 | * l2assoc=A |
| 434 | |
| 435 | L2 cache configuration arguments. They specify the cache size, block size, and |
| 436 | associativity of the L2 cache, respectively. Setting any of the L2 |
| 437 | configuration arguments implies ``l2=on``. |
| 438 | (default: N = 2097152 (2MB), B = 64, A = 16) |