Richard Henderson | 3fdbf5d | 2019-02-23 13:00:10 -0800 | [diff] [blame] | 1 | ======================== |
| 2 | Decodetree Specification |
| 3 | ======================== |
| 4 | |
| 5 | A *decodetree* is built from instruction *patterns*. A pattern may |
| 6 | represent a single architectural instruction or a group of same, depending |
| 7 | on what is convenient for further processing. |
| 8 | |
| 9 | Each pattern has both *fixedbits* and *fixedmask*, the combination of which |
| 10 | describes the condition under which the pattern is matched:: |
| 11 | |
| 12 | (insn & fixedmask) == fixedbits |
| 13 | |
| 14 | Each pattern may have *fields*, which are extracted from the insn and |
| 15 | passed along to the translator. Examples of such are registers, |
| 16 | immediates, and sub-opcodes. |
| 17 | |
| 18 | In support of patterns, one may declare *fields*, *argument sets*, and |
| 19 | *formats*, each of which may be re-used to simplify further definitions. |
| 20 | |
| 21 | Fields |
| 22 | ====== |
| 23 | |
| 24 | Syntax:: |
| 25 | |
Richard Henderson | 94597b6 | 2019-07-22 17:02:56 -0700 | [diff] [blame] | 26 | field_def := '%' identifier ( unnamed_field )* ( !function=identifier )? |
Richard Henderson | 3fdbf5d | 2019-02-23 13:00:10 -0800 | [diff] [blame] | 27 | unnamed_field := number ':' ( 's' ) number |
| 28 | |
| 29 | For *unnamed_field*, the first number is the least-significant bit position |
| 30 | of the field and the second number is the length of the field. If the 's' is |
| 31 | present, the field is considered signed. If multiple ``unnamed_fields`` are |
| 32 | present, they are concatenated. In this way one can define disjoint fields. |
| 33 | |
| 34 | If ``!function`` is specified, the concatenated result is passed through the |
| 35 | named function, taking and returning an integral value. |
| 36 | |
Richard Henderson | 94597b6 | 2019-07-22 17:02:56 -0700 | [diff] [blame] | 37 | One may use ``!function`` with zero ``unnamed_fields``. This case is called |
| 38 | a *parameter*, and the named function is only passed the ``DisasContext`` |
| 39 | and returns an integral value extracted from there. |
| 40 | |
| 41 | A field with no ``unnamed_fields`` and no ``!function`` is in error. |
| 42 | |
Richard Henderson | 3fdbf5d | 2019-02-23 13:00:10 -0800 | [diff] [blame] | 43 | FIXME: the fields of the structure into which this result will be stored |
| 44 | is restricted to ``int``. Which means that we cannot expand 64-bit items. |
| 45 | |
| 46 | Field examples: |
| 47 | |
| 48 | +---------------------------+---------------------------------------------+ |
| 49 | | Input | Generated code | |
| 50 | +===========================+=============================================+ |
| 51 | | %disp 0:s16 | sextract(i, 0, 16) | |
| 52 | +---------------------------+---------------------------------------------+ |
| 53 | | %imm9 16:6 10:3 | extract(i, 16, 6) << 3 | extract(i, 10, 3) | |
| 54 | +---------------------------+---------------------------------------------+ |
| 55 | | %disp12 0:s1 1:1 2:10 | sextract(i, 0, 1) << 11 | | |
| 56 | | | extract(i, 1, 1) << 10 | | |
| 57 | | | extract(i, 2, 10) | |
| 58 | +---------------------------+---------------------------------------------+ |
| 59 | | %shimm8 5:s8 13:1 | expand_shimm8(sextract(i, 5, 8) << 1 | | |
| 60 | | !function=expand_shimm8 | extract(i, 13, 1)) | |
| 61 | +---------------------------+---------------------------------------------+ |
| 62 | |
| 63 | Argument Sets |
| 64 | ============= |
| 65 | |
| 66 | Syntax:: |
| 67 | |
| 68 | args_def := '&' identifier ( args_elt )+ ( !extern )? |
| 69 | args_elt := identifier |
| 70 | |
| 71 | Each *args_elt* defines an argument within the argument set. |
| 72 | Each argument set will be rendered as a C structure "arg_$name" |
| 73 | with each of the fields being one of the member arguments. |
| 74 | |
| 75 | If ``!extern`` is specified, the backing structure is assumed |
| 76 | to have been already declared, typically via a second decoder. |
| 77 | |
Richard Henderson | 5d53b0f | 2019-02-27 18:34:38 -0800 | [diff] [blame] | 78 | Argument sets are useful when one wants to define helper functions |
| 79 | for the translator functions that can perform operations on a common |
| 80 | set of arguments. This can ensure, for instance, that the ``AND`` |
| 81 | pattern and the ``OR`` pattern put their operands into the same named |
| 82 | structure, so that a common ``gen_logic_insn`` may be able to handle |
| 83 | the operations common between the two. |
| 84 | |
Richard Henderson | 3fdbf5d | 2019-02-23 13:00:10 -0800 | [diff] [blame] | 85 | Argument set examples:: |
| 86 | |
| 87 | ®3 ra rb rc |
| 88 | &loadstore reg base offset |
| 89 | |
| 90 | |
| 91 | Formats |
| 92 | ======= |
| 93 | |
| 94 | Syntax:: |
| 95 | |
| 96 | fmt_def := '@' identifier ( fmt_elt )+ |
| 97 | fmt_elt := fixedbit_elt | field_elt | field_ref | args_ref |
| 98 | fixedbit_elt := [01.-]+ |
| 99 | field_elt := identifier ':' 's'? number |
| 100 | field_ref := '%' identifier | identifier '=' '%' identifier |
| 101 | args_ref := '&' identifier |
| 102 | |
| 103 | Defining a format is a handy way to avoid replicating groups of fields |
| 104 | across many instruction patterns. |
| 105 | |
| 106 | A *fixedbit_elt* describes a contiguous sequence of bits that must |
| 107 | be 1, 0, or don't care. The difference between '.' and '-' |
| 108 | is that '.' means that the bit will be covered with a field or a |
| 109 | final 0 or 1 from the pattern, and '-' means that the bit is really |
| 110 | ignored by the cpu and will not be specified. |
| 111 | |
| 112 | A *field_elt* describes a simple field only given a width; the position of |
| 113 | the field is implied by its position with respect to other *fixedbit_elt* |
| 114 | and *field_elt*. |
| 115 | |
| 116 | If any *fixedbit_elt* or *field_elt* appear, then all bits must be defined. |
| 117 | Padding with a *fixedbit_elt* of all '.' is an easy way to accomplish that. |
| 118 | |
| 119 | A *field_ref* incorporates a field by reference. This is the only way to |
| 120 | add a complex field to a format. A field may be renamed in the process |
| 121 | via assignment to another identifier. This is intended to allow the |
| 122 | same argument set be used with disjoint named fields. |
| 123 | |
| 124 | A single *args_ref* may specify an argument set to use for the format. |
| 125 | The set of fields in the format must be a subset of the arguments in |
| 126 | the argument set. If an argument set is not specified, one will be |
| 127 | inferred from the set of fields. |
| 128 | |
| 129 | It is recommended, but not required, that all *field_ref* and *args_ref* |
| 130 | appear at the end of the line, not interleaving with *fixedbit_elf* or |
| 131 | *field_elt*. |
| 132 | |
| 133 | Format examples:: |
| 134 | |
| 135 | @opr ...... ra:5 rb:5 ... 0 ....... rc:5 |
| 136 | @opi ...... ra:5 lit:8 1 ....... rc:5 |
| 137 | |
| 138 | Patterns |
| 139 | ======== |
| 140 | |
| 141 | Syntax:: |
| 142 | |
| 143 | pat_def := identifier ( pat_elt )+ |
| 144 | pat_elt := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt |
| 145 | fmt_ref := '@' identifier |
| 146 | const_elt := identifier '=' number |
| 147 | |
| 148 | The *fixedbit_elt* and *field_elt* specifiers are unchanged from formats. |
| 149 | A pattern that does not specify a named format will have one inferred |
| 150 | from a referenced argument set (if present) and the set of fields. |
| 151 | |
| 152 | A *const_elt* allows a argument to be set to a constant value. This may |
| 153 | come in handy when fields overlap between patterns and one has to |
| 154 | include the values in the *fixedbit_elt* instead. |
| 155 | |
| 156 | The decoder will call a translator function for each pattern matched. |
| 157 | |
| 158 | Pattern examples:: |
| 159 | |
| 160 | addl_r 010000 ..... ..... .... 0000000 ..... @opr |
| 161 | addl_i 010000 ..... ..... .... 0000000 ..... @opi |
| 162 | |
| 163 | which will, in part, invoke:: |
| 164 | |
| 165 | trans_addl_r(ctx, &arg_opr, insn) |
| 166 | |
| 167 | and:: |
| 168 | |
| 169 | trans_addl_i(ctx, &arg_opi, insn) |
Richard Henderson | 0eff2df | 2019-02-23 11:35:36 -0800 | [diff] [blame] | 170 | |
| 171 | Pattern Groups |
| 172 | ============== |
| 173 | |
| 174 | Syntax:: |
| 175 | |
Richard Henderson | ffdfca6 | 2020-08-03 13:57:08 -0700 | [diff] [blame] | 176 | group := overlap_group | no_overlap_group |
| 177 | overlap_group := '{' ( pat_def | group )+ '}' |
| 178 | no_overlap_group := '[' ( pat_def | group )+ ']' |
Richard Henderson | 0eff2df | 2019-02-23 11:35:36 -0800 | [diff] [blame] | 179 | |
Richard Henderson | ffdfca6 | 2020-08-03 13:57:08 -0700 | [diff] [blame] | 180 | A *group* begins with a lone open-brace or open-bracket, with all |
| 181 | subsequent lines indented two spaces, and ending with a lone |
| 182 | close-brace or close-bracket. Groups may be nested, increasing the |
| 183 | required indentation of the lines within the nested group to two |
| 184 | spaces per nesting level. |
Richard Henderson | 0eff2df | 2019-02-23 11:35:36 -0800 | [diff] [blame] | 185 | |
Richard Henderson | ffdfca6 | 2020-08-03 13:57:08 -0700 | [diff] [blame] | 186 | Patterns within overlap groups are allowed to overlap. Conflicts are |
| 187 | resolved by selecting the patterns in order. If all of the fixedbits |
| 188 | for a pattern match, its translate function will be called. If the |
| 189 | translate function returns false, then subsequent patterns within the |
| 190 | group will be matched. |
| 191 | |
| 192 | Patterns within no-overlap groups are not allowed to overlap, just |
| 193 | the same as ungrouped patterns. Thus no-overlap groups are intended |
| 194 | to be nested inside overlap groups. |
Richard Henderson | 0eff2df | 2019-02-23 11:35:36 -0800 | [diff] [blame] | 195 | |
| 196 | The following example from PA-RISC shows specialization of the *or* |
| 197 | instruction:: |
| 198 | |
| 199 | { |
| 200 | { |
| 201 | nop 000010 ----- ----- 0000 001001 0 00000 |
| 202 | copy 000010 00000 r1:5 0000 001001 0 rt:5 |
| 203 | } |
| 204 | or 000010 rt2:5 r1:5 cf:4 001001 0 rt:5 |
| 205 | } |
| 206 | |
| 207 | When the *cf* field is zero, the instruction has no side effects, |
| 208 | and may be specialized. When the *rt* field is zero, the output |
| 209 | is discarded and so the instruction has no effect. When the *rt2* |
Richard Henderson | ffdfca6 | 2020-08-03 13:57:08 -0700 | [diff] [blame] | 210 | field is zero, the operation is ``reg[r1] | 0`` and so encodes |
Richard Henderson | 0eff2df | 2019-02-23 11:35:36 -0800 | [diff] [blame] | 211 | the canonical register copy operation. |
| 212 | |
| 213 | The output from the generator might look like:: |
| 214 | |
| 215 | switch (insn & 0xfc000fe0) { |
| 216 | case 0x08000240: |
| 217 | /* 000010.. ........ ....0010 010..... */ |
| 218 | if ((insn & 0x0000f000) == 0x00000000) { |
| 219 | /* 000010.. ........ 00000010 010..... */ |
| 220 | if ((insn & 0x0000001f) == 0x00000000) { |
| 221 | /* 000010.. ........ 00000010 01000000 */ |
| 222 | extract_decode_Fmt_0(&u.f_decode0, insn); |
| 223 | if (trans_nop(ctx, &u.f_decode0)) return true; |
| 224 | } |
| 225 | if ((insn & 0x03e00000) == 0x00000000) { |
| 226 | /* 00001000 000..... 00000010 010..... */ |
| 227 | extract_decode_Fmt_1(&u.f_decode1, insn); |
| 228 | if (trans_copy(ctx, &u.f_decode1)) return true; |
| 229 | } |
| 230 | } |
| 231 | extract_decode_Fmt_2(&u.f_decode2, insn); |
| 232 | if (trans_or(ctx, &u.f_decode2)) return true; |
| 233 | return false; |
| 234 | } |