Taylor Simpson | c3fb76b | 2021-02-07 23:45:52 -0600 | [diff] [blame] | 1 | Hexagon is Qualcomm's very long instruction word (VLIW) digital signal |
| 2 | processor(DSP). |
| 3 | |
| 4 | The following versions of the Hexagon core are supported |
| 5 | Scalar core: v67 |
| 6 | https://developer.qualcomm.com/downloads/qualcomm-hexagon-v67-programmer-s-reference-manual |
| 7 | |
| 8 | We presented an overview of the project at the 2019 KVM Forum. |
| 9 | https://kvmforum2019.sched.com/event/Tmwc/qemu-hexagon-automatic-translation-of-the-isa-manual-pseudcode-to-tiny-code-instructions-of-a-vliw-architecture-niccolo-izzo-revng-taylor-simpson-qualcomm-innovation-center |
| 10 | |
| 11 | *** Tour of the code *** |
| 12 | |
| 13 | The qemu-hexagon implementation is a combination of qemu and the Hexagon |
| 14 | architecture library (aka archlib). The three primary directories with |
| 15 | Hexagon-specific code are |
| 16 | |
| 17 | qemu/target/hexagon |
| 18 | This has all the instruction and packet semantics |
| 19 | qemu/target/hexagon/imported |
| 20 | These files are imported with very little modification from archlib |
| 21 | *.idef Instruction semantics definition |
| 22 | macros.def Mapping of macros to instruction attributes |
| 23 | encode*.def Encoding patterns for each instruction |
| 24 | iclass.def Instruction class definitions used to determine |
| 25 | legal VLIW slots for each instruction |
| 26 | qemu/linux-user/hexagon |
| 27 | Helpers for loading the ELF file and making Linux system calls, |
| 28 | signals, etc |
| 29 | |
| 30 | We start with scripts that generate a bunch of include files. This |
| 31 | is a two step process. The first step is to use the C preprocessor to expand |
| 32 | macros inside the architecture definition files. This is done in |
| 33 | target/hexagon/gen_semantics.c. This step produces |
| 34 | <BUILD_DIR>/target/hexagon/semantics_generated.pyinc. |
| 35 | That file is consumed by the following python scripts to produce the indicated |
| 36 | header files in <BUILD_DIR>/target/hexagon |
| 37 | gen_opcodes_def.py -> opcodes_def_generated.h.inc |
| 38 | gen_op_regs.py -> op_regs_generated.h.inc |
| 39 | gen_printinsn.py -> printinsn_generated.h.inc |
| 40 | gen_op_attribs.py -> op_attribs_generated.h.inc |
| 41 | gen_helper_protos.py -> helper_protos_generated.h.inc |
| 42 | gen_shortcode.py -> shortcode_generated.h.inc |
| 43 | gen_tcg_funcs.py -> tcg_funcs_generated.c.inc |
| 44 | gen_tcg_func_table.py -> tcg_func_table_generated.c.inc |
| 45 | gen_helper_funcs.py -> helper_funcs_generated.c.inc |
| 46 | |
| 47 | Qemu helper functions have 3 parts |
| 48 | DEF_HELPER declaration indicates the signature of the helper |
| 49 | gen_helper_<NAME> will generate a TCG call to the helper function |
| 50 | The helper implementation |
| 51 | |
| 52 | Here's an example of the A2_add instruction. |
| 53 | Instruction tag A2_add |
| 54 | Assembly syntax "Rd32=add(Rs32,Rt32)" |
| 55 | Instruction semantics "{ RdV=RsV+RtV;}" |
| 56 | |
| 57 | By convention, the operands are identified by letter |
| 58 | RdV is the destination register |
| 59 | RsV, RtV are source registers |
| 60 | |
| 61 | The generator uses the operand naming conventions (see large comment in |
| 62 | hex_common.py) to determine the signature of the helper function. Here are the |
| 63 | results for A2_add |
| 64 | |
| 65 | helper_protos_generated.h.inc |
| 66 | DEF_HELPER_3(A2_add, s32, env, s32, s32) |
| 67 | |
| 68 | tcg_funcs_generated.c.inc |
| 69 | static void generate_A2_add( |
| 70 | CPUHexagonState *env, |
| 71 | DisasContext *ctx, |
| 72 | Insn *insn, |
| 73 | Packet *pkt) |
| 74 | { |
| 75 | TCGv RdV = tcg_temp_local_new(); |
| 76 | const int RdN = insn->regno[0]; |
| 77 | TCGv RsV = hex_gpr[insn->regno[1]]; |
| 78 | TCGv RtV = hex_gpr[insn->regno[2]]; |
| 79 | gen_helper_A2_add(RdV, cpu_env, RsV, RtV); |
| 80 | gen_log_reg_write(RdN, RdV); |
| 81 | ctx_log_reg_write(ctx, RdN); |
| 82 | tcg_temp_free(RdV); |
| 83 | } |
| 84 | |
| 85 | helper_funcs_generated.c.inc |
| 86 | int32_t HELPER(A2_add)(CPUHexagonState *env, int32_t RsV, int32_t RtV) |
| 87 | { |
| 88 | uint32_t slot __attribute__((unused)) = 4; |
| 89 | int32_t RdV = 0; |
| 90 | { RdV=RsV+RtV;} |
| 91 | return RdV; |
| 92 | } |
| 93 | |
| 94 | Note that generate_A2_add updates the disassembly context to be processed |
| 95 | when the packet commits (see "Packet Semantics" below). |
| 96 | |
| 97 | The generator checks for fGEN_TCG_<tag> macro. This allows us to generate |
| 98 | TCG code instead of a call to the helper. If defined, the macro takes 1 |
| 99 | argument. |
| 100 | C semantics (aka short code) |
| 101 | |
| 102 | This allows the code generator to override the auto-generated code. In some |
| 103 | cases this is necessary for correct execution. We can also override for |
| 104 | faster emulation. For example, calling a helper for add is more expensive |
| 105 | than generating a TCG add operation. |
| 106 | |
| 107 | The gen_tcg.h file has any overrides. For example, we could write |
| 108 | #define fGEN_TCG_A2_add(GENHLPR, SHORTCODE) \ |
| 109 | tcg_gen_add_tl(RdV, RsV, RtV) |
| 110 | |
| 111 | The instruction semantics C code relies heavily on macros. In cases where the |
| 112 | C semantics are specified only with macros, we can override the default with |
| 113 | the short semantics option and #define the macros to generate TCG code. One |
| 114 | example is L2_loadw_locked: |
| 115 | Instruction tag L2_loadw_locked |
| 116 | Assembly syntax "Rd32=memw_locked(Rs32)" |
| 117 | Instruction semantics "{ fEA_REG(RsV); fLOAD_LOCKED(1,4,u,EA,RdV) }" |
| 118 | |
| 119 | In gen_tcg.h, we use the shortcode |
| 120 | #define fGEN_TCG_L2_loadw_locked(SHORTCODE) \ |
| 121 | SHORTCODE |
| 122 | |
| 123 | There are also cases where we brute force the TCG code generation. |
| 124 | Instructions with multiple definitions are examples. These require special |
| 125 | handling because qemu helpers can only return a single value. |
| 126 | |
| 127 | In addition to instruction semantics, we use a generator to create the decode |
| 128 | tree. This generation is also a two step process. The first step is to run |
| 129 | target/hexagon/gen_dectree_import.c to produce |
| 130 | <BUILD_DIR>/target/hexagon/iset.py |
| 131 | This file is imported by target/hexagon/dectree.py to produce |
| 132 | <BUILD_DIR>/target/hexagon/dectree_generated.h.inc |
| 133 | |
| 134 | *** Key Files *** |
| 135 | |
| 136 | cpu.h |
| 137 | |
| 138 | This file contains the definition of the CPUHexagonState struct. It is the |
| 139 | runtime information for each thread and contains stuff like the GPR and |
| 140 | predicate registers. |
| 141 | |
| 142 | macros.h |
| 143 | |
| 144 | The Hexagon arch lib relies heavily on macros for the instruction semantics. |
| 145 | This is a great advantage for qemu because we can override them for different |
| 146 | purposes. You will also notice there are sometimes two definitions of a macro. |
| 147 | The QEMU_GENERATE variable determines whether we want the macro to generate TCG |
| 148 | code. If QEMU_GENERATE is not defined, we want the macro to generate vanilla |
| 149 | C code that will work in the helper implementation. |
| 150 | |
| 151 | translate.c |
| 152 | |
| 153 | The functions in this file generate TCG code for a translation block. Some |
| 154 | important functions in this file are |
| 155 | |
| 156 | gen_start_packet - initialize the data structures for packet semantics |
| 157 | gen_commit_packet - commit the register writes, stores, etc for a packet |
| 158 | decode_and_translate_packet - disassemble a packet and generate code |
| 159 | |
| 160 | genptr.c |
| 161 | gen_tcg.h |
| 162 | |
| 163 | These files create a function for each instruction. It is mostly composed of |
| 164 | fGEN_TCG_<tag> definitions followed by including tcg_funcs_generated.c.inc. |
| 165 | |
| 166 | op_helper.c |
| 167 | |
| 168 | This file contains the implementations of all the helpers. There are a few |
| 169 | general purpose helpers, but most of them are generated by including |
| 170 | helper_funcs_generated.c.inc. There are also several helpers used for debugging. |
| 171 | |
| 172 | |
| 173 | *** Packet Semantics *** |
| 174 | |
| 175 | VLIW packet semantics differ from serial semantics in that all input operands |
| 176 | are read, then the operations are performed, then all the results are written. |
| 177 | For exmaple, this packet performs a swap of registers r0 and r1 |
| 178 | { r0 = r1; r1 = r0 } |
| 179 | Note that the result is different if the instructions are executed serially. |
| 180 | |
| 181 | Packet semantics dictate that we defer any changes of state until the entire |
| 182 | packet is committed. We record the results of each instruction in a side data |
| 183 | structure, and update the visible processor state when we commit the packet. |
| 184 | |
| 185 | The data structures are divided between the runtime state and the translation |
| 186 | context. |
| 187 | |
| 188 | During the TCG generation (see translate.[ch]), we use the DisasContext to |
| 189 | track what needs to be done during packet commit. Here are the relevant |
| 190 | fields |
| 191 | |
| 192 | reg_log list of registers written |
| 193 | reg_log_idx index into ctx_reg_log |
| 194 | pred_log list of predicates written |
| 195 | pred_log_idx index into ctx_pred_log |
| 196 | store_width width of stores (indexed by slot) |
| 197 | |
| 198 | During runtime, the following fields in CPUHexagonState (see cpu.h) are used |
| 199 | |
| 200 | new_value new value of a given register |
| 201 | reg_written boolean indicating if register was written |
| 202 | new_pred_value new value of a predicate register |
| 203 | pred_written boolean indicating if predicate was written |
| 204 | mem_log_stores record of the stores (indexed by slot) |
| 205 | |
| 206 | *** Debugging *** |
| 207 | |
| 208 | You can turn on a lot of debugging by changing the HEX_DEBUG macro to 1 in |
| 209 | internal.h. This will stream a lot of information as it generates TCG and |
| 210 | executes the code. |
| 211 | |
| 212 | To track down nasty issues with Hexagon->TCG generation, we compare the |
| 213 | execution results with actual hardware running on a Hexagon Linux target. |
| 214 | Run qemu with the "-d cpu" option. Then, we can diff the results and figure |
| 215 | out where qemu and hardware behave differently. |
| 216 | |
| 217 | The stacks are located at different locations. We handle this by changing |
| 218 | env->stack_adjust in translate.c. First, set this to zero and run qemu. |
| 219 | Then, change env->stack_adjust to the difference between the two stack |
| 220 | locations. Then rebuild qemu and run again. That will produce a very |
| 221 | clean diff. |
| 222 | |
| 223 | Here are some handy places to set breakpoints |
| 224 | |
| 225 | At the call to gen_start_packet for a given PC (note that the line number |
| 226 | might change in the future) |
| 227 | br translate.c:602 if ctx->base.pc_next == 0xdeadbeef |
| 228 | The helper function for each instruction is named helper_<TAG>, so here's |
| 229 | an example that will set a breakpoint at the start |
| 230 | br helper_A2_add |
| 231 | If you have the HEX_DEBUG macro set, the following will be useful |
| 232 | At the start of execution of a packet for a given PC |
| 233 | br helper_debug_start_packet if env->gpr[41] == 0xdeadbeef |
| 234 | At the end of execution of a packet for a given PC |
| 235 | br helper_debug_commit_end if env->this_PC == 0xdeadbeef |