| @node Implementation notes |
| @appendix Implementation notes |
| |
| @menu |
| * CPU emulation:: |
| * Translator Internals:: |
| * QEMU compared to other emulators:: |
| * Managed start up options:: |
| * Bibliography:: |
| @end menu |
| |
| @node CPU emulation |
| @section CPU emulation |
| |
| @menu |
| * x86:: x86 and x86-64 emulation |
| * ARM:: ARM emulation |
| * MIPS:: MIPS emulation |
| * PPC:: PowerPC emulation |
| * SPARC:: Sparc32 and Sparc64 emulation |
| * Xtensa:: Xtensa emulation |
| @end menu |
| |
| @node x86 |
| @subsection x86 and x86-64 emulation |
| |
| QEMU x86 target features: |
| |
| @itemize |
| |
| @item The virtual x86 CPU supports 16 bit and 32 bit addressing with segmentation. |
| LDT/GDT and IDT are emulated. VM86 mode is also supported to run |
| DOSEMU. There is some support for MMX/3DNow!, SSE, SSE2, SSE3, SSSE3, |
| and SSE4 as well as x86-64 SVM. |
| |
| @item Support of host page sizes bigger than 4KB in user mode emulation. |
| |
| @item QEMU can emulate itself on x86. |
| |
| @item An extensive Linux x86 CPU test program is included @file{tests/test-i386}. |
| It can be used to test other x86 virtual CPUs. |
| |
| @end itemize |
| |
| Current QEMU limitations: |
| |
| @itemize |
| |
| @item Limited x86-64 support. |
| |
| @item IPC syscalls are missing. |
| |
| @item The x86 segment limits and access rights are not tested at every |
| memory access (yet). Hopefully, very few OSes seem to rely on that for |
| normal use. |
| |
| @end itemize |
| |
| @node ARM |
| @subsection ARM emulation |
| |
| @itemize |
| |
| @item Full ARM 7 user emulation. |
| |
| @item NWFPE FPU support included in user Linux emulation. |
| |
| @item Can run most ARM Linux binaries. |
| |
| @end itemize |
| |
| @node MIPS |
| @subsection MIPS emulation |
| |
| @itemize |
| |
| @item The system emulation allows full MIPS32/MIPS64 Release 2 emulation, |
| including privileged instructions, FPU and MMU, in both little and big |
| endian modes. |
| |
| @item The Linux userland emulation can run many 32 bit MIPS Linux binaries. |
| |
| @end itemize |
| |
| Current QEMU limitations: |
| |
| @itemize |
| |
| @item Self-modifying code is not always handled correctly. |
| |
| @item 64 bit userland emulation is not implemented. |
| |
| @item The system emulation is not complete enough to run real firmware. |
| |
| @item The watchpoint debug facility is not implemented. |
| |
| @end itemize |
| |
| @node PPC |
| @subsection PowerPC emulation |
| |
| @itemize |
| |
| @item Full PowerPC 32 bit emulation, including privileged instructions, |
| FPU and MMU. |
| |
| @item Can run most PowerPC Linux binaries. |
| |
| @end itemize |
| |
| @node SPARC |
| @subsection Sparc32 and Sparc64 emulation |
| |
| @itemize |
| |
| @item Full SPARC V8 emulation, including privileged |
| instructions, FPU and MMU. SPARC V9 emulation includes most privileged |
| and VIS instructions, FPU and I/D MMU. Alignment is fully enforced. |
| |
| @item Can run most 32-bit SPARC Linux binaries, SPARC32PLUS Linux binaries and |
| some 64-bit SPARC Linux binaries. |
| |
| @end itemize |
| |
| Current QEMU limitations: |
| |
| @itemize |
| |
| @item IPC syscalls are missing. |
| |
| @item Floating point exception support is buggy. |
| |
| @item Atomic instructions are not correctly implemented. |
| |
| @item There are still some problems with Sparc64 emulators. |
| |
| @end itemize |
| |
| @node Xtensa |
| @subsection Xtensa emulation |
| |
| @itemize |
| |
| @item Core Xtensa ISA emulation, including most options: code density, |
| loop, extended L32R, 16- and 32-bit multiplication, 32-bit division, |
| MAC16, miscellaneous operations, boolean, FP coprocessor, coprocessor |
| context, debug, multiprocessor synchronization, |
| conditional store, exceptions, relocatable vectors, unaligned exception, |
| interrupts (including high priority and timer), hardware alignment, |
| region protection, region translation, MMU, windowed registers, thread |
| pointer, processor ID. |
| |
| @item Not implemented options: data/instruction cache (including cache |
| prefetch and locking), XLMI, processor interface. Also options not |
| covered by the core ISA (e.g. FLIX, wide branches) are not implemented. |
| |
| @item Can run most Xtensa Linux binaries. |
| |
| @item New core configuration that requires no additional instructions |
| may be created from overlay with minimal amount of hand-written code. |
| |
| @end itemize |
| |
| @node Translator Internals |
| @section Translator Internals |
| |
| QEMU is a dynamic translator. When it first encounters a piece of code, |
| it converts it to the host instruction set. Usually dynamic translators |
| are very complicated and highly CPU dependent. QEMU uses some tricks |
| which make it relatively easily portable and simple while achieving good |
| performances. |
| |
| QEMU's dynamic translation backend is called TCG, for "Tiny Code |
| Generator". For more information, please take a look at @code{tcg/README}. |
| |
| Some notable features of QEMU's dynamic translator are: |
| |
| @table @strong |
| |
| @item CPU state optimisations: |
| The target CPUs have many internal states which change the way it |
| evaluates instructions. In order to achieve a good speed, the |
| translation phase considers that some state information of the virtual |
| CPU cannot change in it. The state is recorded in the Translation |
| Block (TB). If the state changes (e.g. privilege level), a new TB will |
| be generated and the previous TB won't be used anymore until the state |
| matches the state recorded in the previous TB. The same idea can be applied |
| to other aspects of the CPU state. For example, on x86, if the SS, |
| DS and ES segments have a zero base, then the translator does not even |
| generate an addition for the segment base. |
| |
| @item Direct block chaining: |
| After each translated basic block is executed, QEMU uses the simulated |
| Program Counter (PC) and other cpu state information (such as the CS |
| segment base value) to find the next basic block. |
| |
| In order to accelerate the most common cases where the new simulated PC |
| is known, QEMU can patch a basic block so that it jumps directly to the |
| next one. |
| |
| The most portable code uses an indirect jump. An indirect jump makes |
| it easier to make the jump target modification atomic. On some host |
| architectures (such as x86 or PowerPC), the @code{JUMP} opcode is |
| directly patched so that the block chaining has no overhead. |
| |
| @item Self-modifying code and translated code invalidation: |
| Self-modifying code is a special challenge in x86 emulation because no |
| instruction cache invalidation is signaled by the application when code |
| is modified. |
| |
| User-mode emulation marks a host page as write-protected (if it is |
| not already read-only) every time translated code is generated for a |
| basic block. Then, if a write access is done to the page, Linux raises |
| a SEGV signal. QEMU then invalidates all the translated code in the page |
| and enables write accesses to the page. For system emulation, write |
| protection is achieved through the software MMU. |
| |
| Correct translated code invalidation is done efficiently by maintaining |
| a linked list of every translated block contained in a given page. Other |
| linked lists are also maintained to undo direct block chaining. |
| |
| On RISC targets, correctly written software uses memory barriers and |
| cache flushes, so some of the protection above would not be |
| necessary. However, QEMU still requires that the generated code always |
| matches the target instructions in memory in order to handle |
| exceptions correctly. |
| |
| @item Exception support: |
| longjmp() is used when an exception such as division by zero is |
| encountered. |
| |
| The host SIGSEGV and SIGBUS signal handlers are used to get invalid |
| memory accesses. QEMU keeps a map from host program counter to |
| target program counter, and looks up where the exception happened |
| based on the host program counter at the exception point. |
| |
| On some targets, some bits of the virtual CPU's state are not flushed to the |
| memory until the end of the translation block. This is done for internal |
| emulation state that is rarely accessed directly by the program and/or changes |
| very often throughout the execution of a translation block---this includes |
| condition codes on x86, delay slots on SPARC, conditional execution on |
| ARM, and so on. This state is stored for each target instruction, and |
| looked up on exceptions. |
| |
| @item MMU emulation: |
| For system emulation QEMU uses a software MMU. In that mode, the MMU |
| virtual to physical address translation is done at every memory |
| access. |
| |
| QEMU uses an address translation cache (TLB) to speed up the translation. |
| In order to avoid flushing the translated code each time the MMU |
| mappings change, all caches in QEMU are physically indexed. This |
| means that each basic block is indexed with its physical address. |
| |
| In order to avoid invalidating the basic block chain when MMU mappings |
| change, chaining is only performed when the destination of the jump |
| shares a page with the basic block that is performing the jump. |
| |
| The MMU can also distinguish RAM and ROM memory areas from MMIO memory |
| areas. Access is faster for RAM and ROM because the translation cache also |
| hosts the offset between guest address and host memory. Accessing MMIO |
| memory areas instead calls out to C code for device emulation. |
| Finally, the MMU helps tracking dirty pages and pages pointed to by |
| translation blocks. |
| @end table |
| |
| @node QEMU compared to other emulators |
| @section QEMU compared to other emulators |
| |
| Like bochs [1], QEMU emulates an x86 CPU. But QEMU is much faster than |
| bochs as it uses dynamic compilation. Bochs is closely tied to x86 PC |
| emulation while QEMU can emulate several processors. |
| |
| Like Valgrind [2], QEMU does user space emulation and dynamic |
| translation. Valgrind is mainly a memory debugger while QEMU has no |
| support for it (QEMU could be used to detect out of bound memory |
| accesses as Valgrind, but it has no support to track uninitialised data |
| as Valgrind does). The Valgrind dynamic translator generates better code |
| than QEMU (in particular it does register allocation) but it is closely |
| tied to an x86 host and target and has no support for precise exceptions |
| and system emulation. |
| |
| EM86 [3] is the closest project to user space QEMU (and QEMU still uses |
| some of its code, in particular the ELF file loader). EM86 was limited |
| to an alpha host and used a proprietary and slow interpreter (the |
| interpreter part of the FX!32 Digital Win32 code translator [4]). |
| |
| TWIN from Willows Software was a Windows API emulator like Wine. It is less |
| accurate than Wine but includes a protected mode x86 interpreter to launch |
| x86 Windows executables. Such an approach has greater potential because most |
| of the Windows API is executed natively but it is far more difficult to |
| develop because all the data structures and function parameters exchanged |
| between the API and the x86 code must be converted. |
| |
| User mode Linux [5] was the only solution before QEMU to launch a |
| Linux kernel as a process while not needing any host kernel |
| patches. However, user mode Linux requires heavy kernel patches while |
| QEMU accepts unpatched Linux kernels. The price to pay is that QEMU is |
| slower. |
| |
| The Plex86 [6] PC virtualizer is done in the same spirit as the now |
| obsolete qemu-fast system emulator. It requires a patched Linux kernel |
| to work (you cannot launch the same kernel on your PC), but the |
| patches are really small. As it is a PC virtualizer (no emulation is |
| done except for some privileged instructions), it has the potential of |
| being faster than QEMU. The downside is that a complicated (and |
| potentially unsafe) host kernel patch is needed. |
| |
| The commercial PC Virtualizers (VMWare [7], VirtualPC [8]) are faster |
| than QEMU (without virtualization), but they all need specific, proprietary |
| and potentially unsafe host drivers. Moreover, they are unable to |
| provide cycle exact simulation as an emulator can. |
| |
| VirtualBox [9], Xen [10] and KVM [11] are based on QEMU. QEMU-SystemC |
| [12] uses QEMU to simulate a system where some hardware devices are |
| developed in SystemC. |
| |
| @node Managed start up options |
| @section Managed start up options |
| |
| In system mode emulation, it's possible to create a VM in a paused state using |
| the -S command line option. In this state the machine is completely initialized |
| according to command line options and ready to execute VM code but VCPU threads |
| are not executing any code. The VM state in this paused state depends on the way |
| QEMU was started. It could be in: |
| @table @asis |
| @item initial state (after reset/power on state) |
| @item with direct kernel loading, the initial state could be amended to execute |
| code loaded by QEMU in the VM's RAM and with incoming migration |
| @item with incoming migration, initial state will by amended with the migrated |
| machine state after migration completes. |
| @end table |
| |
| This paused state is typically used by users to query machine state and/or |
| additionally configure the machine (by hotplugging devices) in runtime before |
| allowing VM code to run. |
| |
| However, at the -S pause point, it's impossible to configure options that affect |
| initial VM creation (like: -smp/-m/-numa ...) or cold plug devices. That's |
| when the --preconfig command line option should be used. It allows pausing QEMU |
| before the initial VM creation, in a new preconfig state, where additional |
| queries and configuration can be performed via QMP before moving on to |
| the resulting configuration startup. In the preconfig state, QEMU only allows |
| a limited set of commands over the QMP monitor, where the commands do not |
| depend on an initialized machine, including but not limited to: |
| @table @asis |
| @item qmp_capabilities |
| @item query-qmp-schema |
| @item query-commands |
| @item query-status |
| @item exit-preconfig |
| @end table |
| The full list of commands is in QMP schema which could be queried with |
| query-qmp-schema, where commands supported at preconfig state have option |
| 'allow-preconfig' set to true. |
| |
| @node Bibliography |
| @section Bibliography |
| |
| @table @asis |
| |
| @item [1] |
| @url{http://bochs.sourceforge.net/}, the Bochs IA-32 Emulator Project, |
| by Kevin Lawton et al. |
| |
| @item [2] |
| @url{http://www.valgrind.org/}, Valgrind, an open-source memory debugger |
| for GNU/Linux. |
| |
| @item [3] |
| @url{http://ftp.dreamtime.org/pub/linux/Linux-Alpha/em86/v0.2/docs/em86.html}, |
| the EM86 x86 emulator on Alpha-Linux. |
| |
| @item [4] |
| @url{http://www.usenix.org/publications/library/proceedings/usenix-nt97/@/full_papers/chernoff/chernoff.pdf}, |
| DIGITAL FX!32: Running 32-Bit x86 Applications on Alpha NT, by Anton |
| Chernoff and Ray Hookway. |
| |
| @item [5] |
| @url{http://user-mode-linux.sourceforge.net/}, |
| The User-mode Linux Kernel. |
| |
| @item [6] |
| @url{http://www.plex86.org/}, |
| The new Plex86 project. |
| |
| @item [7] |
| @url{http://www.vmware.com/}, |
| The VMWare PC virtualizer. |
| |
| @item [8] |
| @url{https://www.microsoft.com/download/details.aspx?id=3702}, |
| The VirtualPC PC virtualizer. |
| |
| @item [9] |
| @url{http://virtualbox.org/}, |
| The VirtualBox PC virtualizer. |
| |
| @item [10] |
| @url{http://www.xen.org/}, |
| The Xen hypervisor. |
| |
| @item [11] |
| @url{http://www.linux-kvm.org/}, |
| Kernel Based Virtual Machine (KVM). |
| |
| @item [12] |
| @url{http://www.greensocs.com/projects/QEMUSystemC}, |
| QEMU-SystemC, a hardware co-simulator. |
| |
| @end table |