Alex Bennée | 4d7fe02 | 2020-07-09 15:13:16 +0100 | [diff] [blame] | 1 | .. |
| 2 | Copyright (c) 2020, Linaro Limited |
| 3 | Written by Alex Bennée |
| 4 | |
| 5 | |
| 6 | ======================== |
| 7 | TCG Instruction Counting |
| 8 | ======================== |
| 9 | |
| 10 | TCG has long supported a feature known as icount which allows for |
| 11 | instruction counting during execution. This should not be confused |
| 12 | with cycle accurate emulation - QEMU does not attempt to emulate how |
| 13 | long an instruction would take on real hardware. That is a job for |
| 14 | other more detailed (and slower) tools that simulate the rest of a |
| 15 | micro-architecture. |
| 16 | |
| 17 | This feature is only available for system emulation and is |
| 18 | incompatible with multi-threaded TCG. It can be used to better align |
| 19 | execution time with wall-clock time so a "slow" device doesn't run too |
| 20 | fast on modern hardware. It can also provides for a degree of |
| 21 | deterministic execution and is an essential part of the record/replay |
| 22 | support in QEMU. |
| 23 | |
| 24 | Core Concepts |
| 25 | ============= |
| 26 | |
| 27 | At its heart icount is simply a count of executed instructions which |
| 28 | is stored in the TimersState of QEMU's timer sub-system. The number of |
| 29 | executed instructions can then be used to calculate QEMU_CLOCK_VIRTUAL |
| 30 | which represents the amount of elapsed time in the system since |
| 31 | execution started. Depending on the icount mode this may either be a |
| 32 | fixed number of ns per instruction or adjusted as execution continues |
| 33 | to keep wall clock time and virtual time in sync. |
| 34 | |
| 35 | To be able to calculate the number of executed instructions the |
| 36 | translator starts by allocating a budget of instructions to be |
| 37 | executed. The budget of instructions is limited by how long it will be |
| 38 | until the next timer will expire. We store this budget as part of a |
| 39 | vCPU icount_decr field which shared with the machinery for handling |
| 40 | cpu_exit(). The whole field is checked at the start of every |
| 41 | translated block and will cause a return to the outer loop to deal |
| 42 | with whatever caused the exit. |
| 43 | |
| 44 | In the case of icount, before the flag is checked we subtract the |
| 45 | number of instructions the translation block would execute. If this |
| 46 | would cause the instruction budget to go negative we exit the main |
| 47 | loop and regenerate a new translation block with exactly the right |
| 48 | number of instructions to take the budget to 0 meaning whatever timer |
| 49 | was due to expire will expire exactly when we exit the main run loop. |
| 50 | |
| 51 | Dealing with MMIO |
| 52 | ----------------- |
| 53 | |
| 54 | While we can adjust the instruction budget for known events like timer |
| 55 | expiry we cannot do the same for MMIO. Every load/store we execute |
| 56 | might potentially trigger an I/O event, at which point we will need an |
| 57 | up to date and accurate reading of the icount number. |
| 58 | |
| 59 | To deal with this case, when an I/O access is made we: |
| 60 | |
| 61 | - restore un-executed instructions to the icount budget |
| 62 | - re-compile a single [1]_ instruction block for the current PC |
| 63 | - exit the cpu loop and execute the re-compiled block |
| 64 | |
Alex Bennée | 4d7fe02 | 2020-07-09 15:13:16 +0100 | [diff] [blame] | 65 | .. [1] sometimes two instructions if dealing with delay slots |
| 66 | |
| 67 | Other I/O operations |
| 68 | -------------------- |
| 69 | |
| 70 | MMIO isn't the only type of operation for which we might need a |
| 71 | correct and accurate clock. IO port instructions and accesses to |
| 72 | system registers are the common examples here. These instructions have |
| 73 | to be handled by the individual translators which have the knowledge |
| 74 | of which operations are I/O operations. |
| 75 | |
| 76 | When the translator is handling an instruction of this kind: |
| 77 | |
| 78 | * it must call gen_io_start() if icount is enabled, at some |
| 79 | point before the generation of the code which actually does |
| 80 | the I/O, using a code fragment similar to: |
| 81 | |
| 82 | .. code:: c |
| 83 | |
| 84 | if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) { |
| 85 | gen_io_start(); |
| 86 | } |
| 87 | |
| 88 | * it must end the TB immediately after this instruction |