docs/devel/decodetree.rst - qemu - Git at Google

 ========================
 Decodetree Specification
 ========================

 A *decodetree* is built from instruction *patterns*.  A pattern may
 represent a single architectural instruction or a group of same, depending
 on what is convenient for further processing.

 Each pattern has both *fixedbits* and *fixedmask*, the combination of which
 describes the condition under which the pattern is matched::

   (insn & fixedmask) == fixedbits

 Each pattern may have *fields*, which are extracted from the insn and
 passed along to the translator.  Examples of such are registers,
 immediates, and sub-opcodes.

 In support of patterns, one may declare *fields*, *argument sets*, and
 *formats*, each of which may be re-used to simplify further definitions.

 Fields
 ======

 Syntax::

   field_def     := '%' identifier ( field )* ( !function=identifier )?
   field         := unnamed_field | named_field
   unnamed_field := number ':' ( 's' ) number
   named_field   := identifier ':' ( 's' ) number

 For *unnamed_field*, the first number is the least-significant bit position
 of the field and the second number is the length of the field.  If the 's' is
 present, the field is considered signed.

 A *named_field* refers to some other field in the instruction pattern
 or format. Regardless of the length of the other field where it is
 defined, it will be inserted into this field with the specified
 signedness and bit width.

 Field definitions that involve loops (i.e. where a field is defined
 directly or indirectly in terms of itself) are errors.

 A format can include fields that refer to named fields that are
 defined in the instruction pattern(s) that use the format.
 Conversely, an instruction pattern can include fields that refer to
 named fields that are defined in the format it uses. However you
 cannot currently do both at once (i.e. pattern P uses format F; F has
 a field A that refers to a named field B that is defined in P, and P
 has a field C that refers to a named field D that is defined in F).

 If multiple ``fields`` are present, they are concatenated.
 In this way one can define disjoint fields.

 If ``!function`` is specified, the concatenated result is passed through the
 named function, taking and returning an integral value.

 One may use ``!function`` with zero ``fields``.  This case is called
 a *parameter*, and the named function is only passed the ``DisasContext``
 and returns an integral value extracted from there.

 A field with no ``fields`` and no ``!function`` is in error.

 Field examples:

 +---------------------------+---------------------------------------------+
 | Input                     | Generated code                              |
 +===========================+=============================================+
 | %disp   0:s16             | sextract(i, 0, 16)                          |
 +---------------------------+---------------------------------------------+
 | %imm9   16:6 10:3         | extract(i, 16, 6) << 3 | extract(i, 10, 3)  |
 +---------------------------+---------------------------------------------+
 | %disp12 0:s1 1:1 2:10     | sextract(i, 0, 1) << 11 |                   |
 |                           |    extract(i, 1, 1) << 10 |                 |
 |                           |    extract(i, 2, 10)                        |
 +---------------------------+---------------------------------------------+
 | %shimm8 5:s8 13:1         | expand_shimm8(sextract(i, 5, 8) << 1 |      |
 |   !function=expand_shimm8 |               extract(i, 13, 1))            |
 +---------------------------+---------------------------------------------+
 | %sz_imm 10:2 sz:3         | expand_sz_imm(extract(i, 10, 2) << 3 |      |
 |   !function=expand_sz_imm |               extract(a->sz, 0, 3))         |
 +---------------------------+---------------------------------------------+

 Argument Sets
 =============

 Syntax::

   args_def    := '&' identifier ( args_elt )+ ( !extern )?
   args_elt    := identifier (':' identifier)?

 Each *args_elt* defines an argument within the argument set.
 If the form of the *args_elt* contains a colon, the first
 identifier is the argument name and the second identifier is
 the argument type.  If the colon is missing, the argument
 type will be ``int``.

 Each argument set will be rendered as a C structure "arg_$name"
 with each of the fields being one of the member arguments.

 If ``!extern`` is specified, the backing structure is assumed
 to have been already declared, typically via a second decoder.

 Argument sets are useful when one wants to define helper functions
 for the translator functions that can perform operations on a common
 set of arguments.  This can ensure, for instance, that the ``AND``
 pattern and the ``OR`` pattern put their operands into the same named
 structure, so that a common ``gen_logic_insn`` may be able to handle
 the operations common between the two.

 Argument set examples::

   &reg3       ra rb rc
   &loadstore  reg base offset
   &longldst   reg base offset:int64_t


 Formats
 =======

 Syntax::

   fmt_def      := '@' identifier ( fmt_elt )+
   fmt_elt      := fixedbit_elt | field_elt | field_ref | args_ref
   fixedbit_elt := [01.-]+
   field_elt    := identifier ':' 's'? number
   field_ref    := '%' identifier | identifier '=' '%' identifier
   args_ref     := '&' identifier

 Defining a format is a handy way to avoid replicating groups of fields
 across many instruction patterns.

 A *fixedbit_elt* describes a contiguous sequence of bits that must
 be 1, 0, or don't care.  The difference between '.' and '-'
 is that '.' means that the bit will be covered with a field or a
 final 0 or 1 from the pattern, and '-' means that the bit is really
 ignored by the cpu and will not be specified.

 A *field_elt* describes a simple field only given a width; the position of
 the field is implied by its position with respect to other *fixedbit_elt*
 and *field_elt*.

 If any *fixedbit_elt* or *field_elt* appear, then all bits must be defined.
 Padding with a *fixedbit_elt* of all '.' is an easy way to accomplish that.

 A *field_ref* incorporates a field by reference.  This is the only way to
 add a complex field to a format.  A field may be renamed in the process
 via assignment to another identifier.  This is intended to allow the
 same argument set be used with disjoint named fields.

 A single *args_ref* may specify an argument set to use for the format.
 The set of fields in the format must be a subset of the arguments in
 the argument set.  If an argument set is not specified, one will be
 inferred from the set of fields.

 It is recommended, but not required, that all *field_ref* and *args_ref*
 appear at the end of the line, not interleaving with *fixedbit_elf* or
 *field_elt*.

 Format examples::

   @opr    ...... ra:5 rb:5 ... 0 ....... rc:5
   @opi    ...... ra:5 lit:8    1 ....... rc:5

 Patterns
 ========

 Syntax::

   pat_def      := identifier ( pat_elt )+
   pat_elt      := fixedbit_elt | field_elt | field_ref | args_ref | fmt_ref | const_elt
   fmt_ref      := '@' identifier
   const_elt    := identifier '=' number

 The *fixedbit_elt* and *field_elt* specifiers are unchanged from formats.
 A pattern that does not specify a named format will have one inferred
 from a referenced argument set (if present) and the set of fields.

 A *const_elt* allows a argument to be set to a constant value.  This may
 come in handy when fields overlap between patterns and one has to
 include the values in the *fixedbit_elt* instead.

 The decoder will call a translator function for each pattern matched.

 Pattern examples::

   addl_r   010000 ..... ..... .... 0000000 ..... @opr
   addl_i   010000 ..... ..... .... 0000000 ..... @opi

 which will, in part, invoke::

   trans_addl_r(ctx, &arg_opr, insn)

 and::

   trans_addl_i(ctx, &arg_opi, insn)

 Pattern Groups
 ==============

 Syntax::

   group            := overlap_group | no_overlap_group
   overlap_group    := '{' ( pat_def | group )+ '}'
   no_overlap_group := '[' ( pat_def | group )+ ']'

 A *group* begins with a lone open-brace or open-bracket, with all
 subsequent lines indented two spaces, and ending with a lone
 close-brace or close-bracket.  Groups may be nested, increasing the
 required indentation of the lines within the nested group to two
 spaces per nesting level.

 Patterns within overlap groups are allowed to overlap.  Conflicts are
 resolved by selecting the patterns in order.  If all of the fixedbits
 for a pattern match, its translate function will be called.  If the
 translate function returns false, then subsequent patterns within the
 group will be matched.

 Patterns within no-overlap groups are not allowed to overlap, just
 the same as ungrouped patterns.  Thus no-overlap groups are intended
 to be nested inside overlap groups.

 The following example from PA-RISC shows specialization of the *or*
 instruction::

   {
     {
       nop   000010 ----- ----- 0000 001001 0 00000
       copy  000010 00000 r1:5  0000 001001 0 rt:5
     }
     or      000010 rt2:5 r1:5  cf:4 001001 0 rt:5
   }

 When the *cf* field is zero, the instruction has no side effects,
 and may be specialized.  When the *rt* field is zero, the output
 is discarded and so the instruction has no effect.  When the *rt2*
 field is zero, the operation is ``reg[r1] | 0`` and so encodes
 the canonical register copy operation.

 The output from the generator might look like::

   switch (insn & 0xfc000fe0) {
   case 0x08000240:
     /* 000010.. ........ ....0010 010..... */
     if ((insn & 0x0000f000) == 0x00000000) {
         /* 000010.. ........ 00000010 010..... */
         if ((insn & 0x0000001f) == 0x00000000) {
             /* 000010.. ........ 00000010 01000000 */
             extract_decode_Fmt_0(&u.f_decode0, insn);
             if (trans_nop(ctx, &u.f_decode0)) return true;
         }
         if ((insn & 0x03e00000) == 0x00000000) {
             /* 00001000 000..... 00000010 010..... */
             extract_decode_Fmt_1(&u.f_decode1, insn);
             if (trans_copy(ctx, &u.f_decode1)) return true;
         }
     }
     extract_decode_Fmt_2(&u.f_decode2, insn);
     if (trans_or(ctx, &u.f_decode2)) return true;
     return false;
   }
	========================
	Decodetree Specification
	========================

	A decodetree is built from instruction patterns. A pattern may
	represent a single architectural instruction or a group of same, depending
	on what is convenient for further processing.

	Each pattern has both fixedbits and fixedmask, the combination of which
	describes the condition under which the pattern is matched::

	(insn & fixedmask) == fixedbits

	Each pattern may have fields, which are extracted from the insn and
	passed along to the translator. Examples of such are registers,
	immediates, and sub-opcodes.

	In support of patterns, one may declare fields, argument sets, and
	formats, each of which may be re-used to simplify further definitions.

	Fields
	======

	Syntax::

	field_def := '%' identifier ( field )* ( !function=identifier )?
	field := unnamed_field \| named_field
	unnamed_field := number ':' ( 's' ) number
	named_field := identifier ':' ( 's' ) number

	For unnamed_field, the first number is the least-significant bit position
	of the field and the second number is the length of the field. If the 's' is
	present, the field is considered signed.

	A named_field refers to some other field in the instruction pattern
	or format. Regardless of the length of the other field where it is
	defined, it will be inserted into this field with the specified
	signedness and bit width.

	Field definitions that involve loops (i.e. where a field is defined
	directly or indirectly in terms of itself) are errors.

	A format can include fields that refer to named fields that are
	defined in the instruction pattern(s) that use the format.
	Conversely, an instruction pattern can include fields that refer to
	named fields that are defined in the format it uses. However you
	cannot currently do both at once (i.e. pattern P uses format F; F has
	a field A that refers to a named field B that is defined in P, and P
	has a field C that refers to a named field D that is defined in F).

	If multiple ``fields`` are present, they are concatenated.
	In this way one can define disjoint fields.

	If ``!function`` is specified, the concatenated result is passed through the
	named function, taking and returning an integral value.

	One may use ``!function`` with zero ``fields``. This case is called
	a parameter, and the named function is only passed the ``DisasContext``
	and returns an integral value extracted from there.

	A field with no ``fields`` and no ``!function`` is in error.

	Field examples:

	+---------------------------+---------------------------------------------+
	\| Input \| Generated code \|
	+===========================+=============================================+
	\| %disp 0:s16 \| sextract(i, 0, 16) \|
	+---------------------------+---------------------------------------------+
	\| %imm9 16:6 10:3 \| extract(i, 16, 6) << 3 \| extract(i, 10, 3) \|
	+---------------------------+---------------------------------------------+
	\| %disp12 0:s1 1:1 2:10 \| sextract(i, 0, 1) << 11 \| \|
	\| \| extract(i, 1, 1) << 10 \| \|
	\| \| extract(i, 2, 10) \|
	+---------------------------+---------------------------------------------+
	\| %shimm8 5:s8 13:1 \| expand_shimm8(sextract(i, 5, 8) << 1 \| \|
	\| !function=expand_shimm8 \| extract(i, 13, 1)) \|
	+---------------------------+---------------------------------------------+
	\| %sz_imm 10:2 sz:3 \| expand_sz_imm(extract(i, 10, 2) << 3 \| \|
	\| !function=expand_sz_imm \| extract(a->sz, 0, 3)) \|
	+---------------------------+---------------------------------------------+

	Argument Sets
	=============

	Syntax::

	args_def := '&' identifier ( args_elt )+ ( !extern )?
	args_elt := identifier (':' identifier)?

	Each args_elt defines an argument within the argument set.
	If the form of the args_elt contains a colon, the first
	identifier is the argument name and the second identifier is
	the argument type. If the colon is missing, the argument
	type will be ``int``.

	Each argument set will be rendered as a C structure "arg_$name"
	with each of the fields being one of the member arguments.

	If ``!extern`` is specified, the backing structure is assumed
	to have been already declared, typically via a second decoder.

	Argument sets are useful when one wants to define helper functions
	for the translator functions that can perform operations on a common
	set of arguments. This can ensure, for instance, that the ``AND``
	pattern and the ``OR`` pattern put their operands into the same named
	structure, so that a common ``gen_logic_insn`` may be able to handle
	the operations common between the two.

	Argument set examples::

	&reg3 ra rb rc
	&loadstore reg base offset
	&longldst reg base offset:int64_t


	Formats
	=======

	Syntax::

	fmt_def := '@' identifier ( fmt_elt )+
	fmt_elt := fixedbit_elt \| field_elt \| field_ref \| args_ref
	fixedbit_elt := [01.-]+
	field_elt := identifier ':' 's'? number
	field_ref := '%' identifier \| identifier '=' '%' identifier
	args_ref := '&' identifier

	Defining a format is a handy way to avoid replicating groups of fields
	across many instruction patterns.

	A fixedbit_elt describes a contiguous sequence of bits that must
	be 1, 0, or don't care. The difference between '.' and '-'
	is that '.' means that the bit will be covered with a field or a
	final 0 or 1 from the pattern, and '-' means that the bit is really
	ignored by the cpu and will not be specified.

	A field_elt describes a simple field only given a width; the position of
	the field is implied by its position with respect to other fixedbit_elt
	and field_elt.

	If any fixedbit_elt or field_elt appear, then all bits must be defined.
	Padding with a fixedbit_elt of all '.' is an easy way to accomplish that.

	A field_ref incorporates a field by reference. This is the only way to
	add a complex field to a format. A field may be renamed in the process
	via assignment to another identifier. This is intended to allow the
	same argument set be used with disjoint named fields.

	A single args_ref may specify an argument set to use for the format.
	The set of fields in the format must be a subset of the arguments in
	the argument set. If an argument set is not specified, one will be
	inferred from the set of fields.

	It is recommended, but not required, that all field_ref and args_ref
	appear at the end of the line, not interleaving with fixedbit_elf or
	field_elt.

	Format examples::

	@opr ...... ra:5 rb:5 ... 0 ....... rc:5
	@opi ...... ra:5 lit:8 1 ....... rc:5

	Patterns
	========

	Syntax::

	pat_def := identifier ( pat_elt )+
	pat_elt := fixedbit_elt \| field_elt \| field_ref \| args_ref \| fmt_ref \| const_elt
	fmt_ref := '@' identifier
	const_elt := identifier '=' number

	The fixedbit_elt and field_elt specifiers are unchanged from formats.
	A pattern that does not specify a named format will have one inferred
	from a referenced argument set (if present) and the set of fields.

	A const_elt allows a argument to be set to a constant value. This may
	come in handy when fields overlap between patterns and one has to
	include the values in the fixedbit_elt instead.

	The decoder will call a translator function for each pattern matched.

	Pattern examples::

	addl_r 010000 ..... ..... .... 0000000 ..... @opr
	addl_i 010000 ..... ..... .... 0000000 ..... @opi

	which will, in part, invoke::

	trans_addl_r(ctx, &arg_opr, insn)

	and::

	trans_addl_i(ctx, &arg_opi, insn)

	Pattern Groups
	==============

	Syntax::

	group := overlap_group \| no_overlap_group
	overlap_group := '{' ( pat_def \| group )+ '}'
	no_overlap_group := '[' ( pat_def \| group )+ ']'

	A group begins with a lone open-brace or open-bracket, with all
	subsequent lines indented two spaces, and ending with a lone
	close-brace or close-bracket. Groups may be nested, increasing the
	required indentation of the lines within the nested group to two
	spaces per nesting level.

	Patterns within overlap groups are allowed to overlap. Conflicts are
	resolved by selecting the patterns in order. If all of the fixedbits
	for a pattern match, its translate function will be called. If the
	translate function returns false, then subsequent patterns within the
	group will be matched.

	Patterns within no-overlap groups are not allowed to overlap, just
	the same as ungrouped patterns. Thus no-overlap groups are intended
	to be nested inside overlap groups.

	The following example from PA-RISC shows specialization of the or
	instruction::

	{
	{
	nop 000010 ----- ----- 0000 001001 0 00000
	copy 000010 00000 r1:5 0000 001001 0 rt:5
	}
	or 000010 rt2:5 r1:5 cf:4 001001 0 rt:5
	}

	When the cf field is zero, the instruction has no side effects,
	and may be specialized. When the rt field is zero, the output
	is discarded and so the instruction has no effect. When the rt2
	field is zero, the operation is ``reg[r1] \| 0`` and so encodes
	the canonical register copy operation.

	The output from the generator might look like::

	switch (insn & 0xfc000fe0) {
	case 0x08000240:
	/* 000010.. ........ ....0010 010..... */
	if ((insn & 0x0000f000) == 0x00000000) {
	/* 000010.. ........ 00000010 010..... */
	if ((insn & 0x0000001f) == 0x00000000) {
	/* 000010.. ........ 00000010 01000000 */
	extract_decode_Fmt_0(&u.f_decode0, insn);
	if (trans_nop(ctx, &u.f_decode0)) return true;
	}
	if ((insn & 0x03e00000) == 0x00000000) {
	/* 00001000 000..... 00000010 010..... */
	extract_decode_Fmt_1(&u.f_decode1, insn);
	if (trans_copy(ctx, &u.f_decode1)) return true;
	}
	}
	extract_decode_Fmt_2(&u.f_decode2, insn);
	if (trans_or(ctx, &u.f_decode2)) return true;
	return false;
	}