Blame - docs/devel/tcg-icount.rst - qemu

blob: 7df883446a74f567727cf6e0157a88ca12862e73 [file] [log] [blame]

Alex Bennée	4d7fe02	2020-07-09 15:13:16 +0100	[diff] [blame]	1	..
				2	Copyright (c) 2020, Linaro Limited
				3	Written by Alex Bennée
				4
				5
				6	========================
				7	TCG Instruction Counting
				8	========================
				9
				10	TCG has long supported a feature known as icount which allows for
				11	instruction counting during execution. This should not be confused
				12	with cycle accurate emulation - QEMU does not attempt to emulate how
				13	long an instruction would take on real hardware. That is a job for
				14	other more detailed (and slower) tools that simulate the rest of a
				15	micro-architecture.
				16
				17	This feature is only available for system emulation and is
				18	incompatible with multi-threaded TCG. It can be used to better align
				19	execution time with wall-clock time so a "slow" device doesn't run too
				20	fast on modern hardware. It can also provides for a degree of
				21	deterministic execution and is an essential part of the record/replay
				22	support in QEMU.
				23
				24	Core Concepts
				25	=============
				26
				27	At its heart icount is simply a count of executed instructions which
				28	is stored in the TimersState of QEMU's timer sub-system. The number of
				29	executed instructions can then be used to calculate QEMU_CLOCK_VIRTUAL
				30	which represents the amount of elapsed time in the system since
				31	execution started. Depending on the icount mode this may either be a
				32	fixed number of ns per instruction or adjusted as execution continues
				33	to keep wall clock time and virtual time in sync.
				34
				35	To be able to calculate the number of executed instructions the
				36	translator starts by allocating a budget of instructions to be
				37	executed. The budget of instructions is limited by how long it will be
				38	until the next timer will expire. We store this budget as part of a
				39	vCPU icount_decr field which shared with the machinery for handling
				40	cpu_exit(). The whole field is checked at the start of every
				41	translated block and will cause a return to the outer loop to deal
				42	with whatever caused the exit.
				43
				44	In the case of icount, before the flag is checked we subtract the
				45	number of instructions the translation block would execute. If this
				46	would cause the instruction budget to go negative we exit the main
				47	loop and regenerate a new translation block with exactly the right
				48	number of instructions to take the budget to 0 meaning whatever timer
				49	was due to expire will expire exactly when we exit the main run loop.
				50
				51	Dealing with MMIO
				52	-----------------
				53
				54	While we can adjust the instruction budget for known events like timer
				55	expiry we cannot do the same for MMIO. Every load/store we execute
				56	might potentially trigger an I/O event, at which point we will need an
				57	up to date and accurate reading of the icount number.
				58
				59	To deal with this case, when an I/O access is made we:
				60
				61	- restore un-executed instructions to the icount budget
				62	- re-compile a single [1]_ instruction block for the current PC
				63	- exit the cpu loop and execute the re-compiled block
				64
Alex Bennée	4d7fe02	2020-07-09 15:13:16 +0100	[diff] [blame]	65	.. [1] sometimes two instructions if dealing with delay slots
				66
				67	Other I/O operations
				68	--------------------
				69
				70	MMIO isn't the only type of operation for which we might need a
				71	correct and accurate clock. IO port instructions and accesses to
				72	system registers are the common examples here. These instructions have
				73	to be handled by the individual translators which have the knowledge
				74	of which operations are I/O operations.
				75
				76	When the translator is handling an instruction of this kind:
				77
				78	* it must call gen_io_start() if icount is enabled, at some
				79	point before the generation of the code which actually does
				80	the I/O, using a code fragment similar to:
				81
				82	.. code:: c
				83
				84	if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
				85	gen_io_start();
				86	}
				87
				88	* it must end the TB immediately after this instruction