|  | ======================== | 
|  | Decodetree Specification | 
|  | ======================== | 
|  |  | 
|  | A *decodetree* is built from instruction *patterns*.  A pattern may | 
|  | represent a single architectural instruction or a group of same, depending | 
|  | on what is convenient for further processing. | 
|  |  | 
|  | Each pattern has both *fixedbits* and *fixedmask*, the combination of which | 
|  | describes the condition under which the pattern is matched:: | 
|  |  | 
|  | (insn & fixedmask) == fixedbits | 
|  |  | 
|  | Each pattern may have *fields*, which are extracted from the insn and | 
|  | passed along to the translator.  Examples of such are registers, | 
|  | immediates, and sub-opcodes. | 
|  |  | 
|  | In support of patterns, one may declare *fields*, *argument sets*, and | 
|  | *formats*, each of which may be re-used to simplify further definitions. | 
|  |  | 
|  | Fields | 
|  | ====== | 
|  |  | 
|  | Syntax:: | 
|  |  | 
|  | field_def     := '%' identifier ( field )* ( !function=identifier )? | 
|  | field         := unnamed_field | named_field | 
|  | unnamed_field := number ':' ( 's' ) number | 
|  | named_field   := identifier ':' ( 's' ) number | 
|  |  | 
|  | For *unnamed_field*, the first number is the least-significant bit position | 
|  | of the field and the second number is the length of the field.  If the 's' is | 
|  | present, the field is considered signed. | 
|  |  | 
|  | A *named_field* refers to some other field in the instruction pattern | 
|  | or format. Regardless of the length of the other field where it is | 
|  | defined, it will be inserted into this field with the specified | 
|  | signedness and bit width. | 
|  |  | 
|  | Field definitions that involve loops (i.e. where a field is defined | 
|  | directly or indirectly in terms of itself) are errors. | 
|  |  | 
|  | A format can include fields that refer to named fields that are | 
|  | defined in the instruction pattern(s) that use the format. | 
|  | Conversely, an instruction pattern can include fields that refer to | 
|  | named fields that are defined in the format it uses. However you | 
|  | cannot currently do both at once (i.e. pattern P uses format F; F has | 
|  | a field A that refers to a named field B that is defined in P, and P | 
|  | has a field C that refers to a named field D that is defined in F). | 
|  |  | 
|  | If multiple ``fields`` are present, they are concatenated. | 
|  | In this way one can define disjoint fields. | 
|  |  | 
|  | If ``!function`` is specified, the concatenated result is passed through the | 
|  | named function, taking and returning an integral value. | 
|  |  | 
|  | One may use ``!function`` with zero ``fields``.  This case is called | 
|  | a *parameter*, and the named function is only passed the ``DisasContext`` | 
|  | and returns an integral value extracted from there. | 
|  |  | 
|  | A field with no ``fields`` and no ``!function`` is in error. | 
|  |  | 
|  | Field examples: | 
|  |  | 
|  | +---------------------------+---------------------------------------------+ | 
|  | | Input                     | Generated code                              | | 
|  | +===========================+=============================================+ | 
|  | | %disp   0:s16             | sextract(i, 0, 16)                          | | 
|  | +---------------------------+---------------------------------------------+ | 
|  | | %imm9   16:6 10:3         | extract(i, 16, 6) << 3 | extract(i, 10, 3)  | | 
|  | +---------------------------+---------------------------------------------+ | 
|  | | %disp12 0:s1 1:1 2:10     | sextract(i, 0, 1) << 11 |                   | | 
|  | |                           |    extract(i, 1, 1) << 10 |                 | | 
|  | |                           |    extract(i, 2, 10)                        | | 
|  | +---------------------------+---------------------------------------------+ | 
|  | | %shimm8 5:s8 13:1         | expand_shimm8(sextract(i, 5, 8) << 1 |      | | 
|  | |   !function=expand_shimm8 |               extract(i, 13, 1))            | | 
|  | +---------------------------+---------------------------------------------+ | 
|  | | %sz_imm 10:2 sz:3         | expand_sz_imm(extract(i, 10, 2) << 3 |      | | 
|  | |   !function=expand_sz_imm |               extract(a->sz, 0, 3))         | | 
|  | +---------------------------+---------------------------------------------+ | 
|  |  | 
|  | Argument Sets | 
|  | ============= | 
|  |  | 
|  | Syntax:: | 
|  |  | 
|  | args_def    := '&' identifier ( args_elt )+ ( !extern )? | 
|  | args_elt    := identifier (':' identifier)? | 
|  |  | 
|  | Each *args_elt* defines an argument within the argument set. | 
|  | If the form of the *args_elt* contains a colon, the first | 
|  | identifier is the argument name and the second identifier is | 
|  | the argument type.  If the colon is missing, the argument | 
|  | type will be ``int``. | 
|  |  | 
|  | Each argument set will be rendered as a C structure "arg_$name" | 
|  | with each of the fields being one of the member arguments. | 
|  |  | 
|  | If ``!extern`` is specified, the backing structure is assumed | 
|  | to have been already declared, typically via a second decoder. | 
|  |  | 
|  | Argument sets are useful when one wants to define helper functions | 
|  | for the translator functions that can perform operations on a common | 
|  | set of arguments.  This can ensure, for instance, that the ``AND`` | 
|  | pattern and the ``OR`` pattern put their operands into the same named | 
|  | structure, so that a common ``gen_logic_insn`` may be able to handle | 
|  | the operations common between the two. | 
|  |  | 
|  | Argument set examples:: | 
|  |  | 
|  | ®3       ra rb rc | 
|  | &loadstore  reg base offset | 
|  | &longldst   reg base offset:int64_t | 
|  |  | 
|  |  | 
|  | Formats | 
|  | ======= | 
|  |  | 
|  | Syntax:: | 
|  |  | 
|  | fmt_def      := '@' identifier ( fmt_elt )+ | 
|  | fmt_elt      := fixedbit_elt | field_elt | field_ref | args_ref | 
|  | fixedbit_elt := [01.-]+ | 
|  | field_elt    := identifier ':' 's'? number | 
|  | field_ref    := '%' identifier | identifier '=' '%' identifier | 
|  | args_ref     := '&' identifier | 
|  |  | 
|  | Defining a format is a handy way to avoid replicating groups of fields | 
|  | across many instruction patterns. | 
|  |  | 
|  | A *fixedbit_elt* describes a contiguous sequence of bits that must | 
|  | be 1, 0, or don't care.  The difference between '.' and '-' | 
|  | is that '.' means that the bit will be covered with a field or a | 
|  | final 0 or 1 from the pattern, and '-' means that the bit is really | 
|  | ignored by the cpu and will not be specified. | 
|  |  | 
|  | A *field_elt* describes a simple field only given a width; the position of | 
|  | the field is implied by its position with respect to other *fixedbit_elt* | 
|  | and *field_elt*. | 
|  |  | 
|  | If any *fixedbit_elt* or *field_elt* appear, then all bits must be defined. | 
|  | Padding with a *fixedbit_elt* of all '.' is an easy way to accomplish that. | 
|  |  | 
|  | A *field_ref* incorporates a field by reference.  This is the only way to | 
|  | add a complex field to a format.  A field may be renamed in the process | 
|  | via assignment to another identifier.  This is intended to allow the | 
|  | same argument set be used with disjoint named fields. | 
|  |  | 
|  | A single *args_ref* may specify an argument set to use for the format. | 
|  | The set of fields in the format must be a subset of the arguments in | 
|  | the argument set.  If an argument set is not specified, one will be | 
|  | inferred from the set of fields. | 
|  |  | 
|  | It is recommended, but not required, that all *field_ref* and *args_ref* | 
|  | appear at the end of the line, not interleaving with *fixedbit_elf* or | 
|  | *field_elt*. | 
|  |  | 
|  | Format examples:: | 
|  |  | 
|  | @opr    ...... ra:5 rb:5 ... 0 ....... rc:5 | 
|  | @opi    ...... ra:5 lit:8    1 ....... rc:5 | 
|  |  | 
|  | Patterns | 
|  | ======== | 
|  |  | 
|  | Syntax:: | 
|  |  | 
|  | pat_def      := identifier ( pat_elt )+ | 
|  | pat_elt      := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt | 
|  | fmt_ref      := '@' identifier | 
|  | const_elt    := identifier '=' number | 
|  |  | 
|  | The *fixedbit_elt* and *field_elt* specifiers are unchanged from formats. | 
|  | A pattern that does not specify a named format will have one inferred | 
|  | from a referenced argument set (if present) and the set of fields. | 
|  |  | 
|  | A *const_elt* allows a argument to be set to a constant value.  This may | 
|  | come in handy when fields overlap between patterns and one has to | 
|  | include the values in the *fixedbit_elt* instead. | 
|  |  | 
|  | The decoder will call a translator function for each pattern matched. | 
|  |  | 
|  | Pattern examples:: | 
|  |  | 
|  | addl_r   010000 ..... ..... .... 0000000 ..... @opr | 
|  | addl_i   010000 ..... ..... .... 0000000 ..... @opi | 
|  |  | 
|  | which will, in part, invoke:: | 
|  |  | 
|  | trans_addl_r(ctx, &arg_opr, insn) | 
|  |  | 
|  | and:: | 
|  |  | 
|  | trans_addl_i(ctx, &arg_opi, insn) | 
|  |  | 
|  | Pattern Groups | 
|  | ============== | 
|  |  | 
|  | Syntax:: | 
|  |  | 
|  | group            := overlap_group | no_overlap_group | 
|  | overlap_group    := '{' ( pat_def | group )+ '}' | 
|  | no_overlap_group := '[' ( pat_def | group )+ ']' | 
|  |  | 
|  | A *group* begins with a lone open-brace or open-bracket, with all | 
|  | subsequent lines indented two spaces, and ending with a lone | 
|  | close-brace or close-bracket.  Groups may be nested, increasing the | 
|  | required indentation of the lines within the nested group to two | 
|  | spaces per nesting level. | 
|  |  | 
|  | Patterns within overlap groups are allowed to overlap.  Conflicts are | 
|  | resolved by selecting the patterns in order.  If all of the fixedbits | 
|  | for a pattern match, its translate function will be called.  If the | 
|  | translate function returns false, then subsequent patterns within the | 
|  | group will be matched. | 
|  |  | 
|  | Patterns within no-overlap groups are not allowed to overlap, just | 
|  | the same as ungrouped patterns.  Thus no-overlap groups are intended | 
|  | to be nested inside overlap groups. | 
|  |  | 
|  | The following example from PA-RISC shows specialization of the *or* | 
|  | instruction:: | 
|  |  | 
|  | { | 
|  | { | 
|  | nop   000010 ----- ----- 0000 001001 0 00000 | 
|  | copy  000010 00000 r1:5  0000 001001 0 rt:5 | 
|  | } | 
|  | or      000010 rt2:5 r1:5  cf:4 001001 0 rt:5 | 
|  | } | 
|  |  | 
|  | When the *cf* field is zero, the instruction has no side effects, | 
|  | and may be specialized.  When the *rt* field is zero, the output | 
|  | is discarded and so the instruction has no effect.  When the *rt2* | 
|  | field is zero, the operation is ``reg[r1] | 0`` and so encodes | 
|  | the canonical register copy operation. | 
|  |  | 
|  | The output from the generator might look like:: | 
|  |  | 
|  | switch (insn & 0xfc000fe0) { | 
|  | case 0x08000240: | 
|  | /* 000010.. ........ ....0010 010..... */ | 
|  | if ((insn & 0x0000f000) == 0x00000000) { | 
|  | /* 000010.. ........ 00000010 010..... */ | 
|  | if ((insn & 0x0000001f) == 0x00000000) { | 
|  | /* 000010.. ........ 00000010 01000000 */ | 
|  | extract_decode_Fmt_0(&u.f_decode0, insn); | 
|  | if (trans_nop(ctx, &u.f_decode0)) return true; | 
|  | } | 
|  | if ((insn & 0x03e00000) == 0x00000000) { | 
|  | /* 00001000 000..... 00000010 010..... */ | 
|  | extract_decode_Fmt_1(&u.f_decode1, insn); | 
|  | if (trans_copy(ctx, &u.f_decode1)) return true; | 
|  | } | 
|  | } | 
|  | extract_decode_Fmt_2(&u.f_decode2, insn); | 
|  | if (trans_or(ctx, &u.f_decode2)) return true; | 
|  | return false; | 
|  | } |