Stefan Hajnoczi | e841257 | 2019-05-09 13:18:20 +0100 | [diff] [blame] | 1 | @node Security |
| 2 | @chapter Security |
| 3 | |
| 4 | @section Overview |
| 5 | |
| 6 | This chapter explains the security requirements that QEMU is designed to meet |
| 7 | and principles for securely deploying QEMU. |
| 8 | |
| 9 | @section Security Requirements |
| 10 | |
| 11 | QEMU supports many different use cases, some of which have stricter security |
| 12 | requirements than others. The community has agreed on the overall security |
| 13 | requirements that users may depend on. These requirements define what is |
| 14 | considered supported from a security perspective. |
| 15 | |
| 16 | @subsection Virtualization Use Case |
| 17 | |
| 18 | The virtualization use case covers cloud and virtual private server (VPS) |
| 19 | hosting, as well as traditional data center and desktop virtualization. These |
| 20 | use cases rely on hardware virtualization extensions to execute guest code |
| 21 | safely on the physical CPU at close-to-native speed. |
| 22 | |
| 23 | The following entities are untrusted, meaning that they may be buggy or |
| 24 | malicious: |
| 25 | |
| 26 | @itemize |
| 27 | @item Guest |
| 28 | @item User-facing interfaces (e.g. VNC, SPICE, WebSocket) |
| 29 | @item Network protocols (e.g. NBD, live migration) |
| 30 | @item User-supplied files (e.g. disk images, kernels, device trees) |
| 31 | @item Passthrough devices (e.g. PCI, USB) |
| 32 | @end itemize |
| 33 | |
| 34 | Bugs affecting these entities are evaluated on whether they can cause damage in |
| 35 | real-world use cases and treated as security bugs if this is the case. |
| 36 | |
| 37 | @subsection Non-virtualization Use Case |
| 38 | |
| 39 | The non-virtualization use case covers emulation using the Tiny Code Generator |
| 40 | (TCG). In principle the TCG and device emulation code used in conjunction with |
| 41 | the non-virtualization use case should meet the same security requirements as |
| 42 | the virtualization use case. However, for historical reasons much of the |
| 43 | non-virtualization use case code was not written with these security |
| 44 | requirements in mind. |
| 45 | |
| 46 | Bugs affecting the non-virtualization use case are not considered security |
| 47 | bugs at this time. Users with non-virtualization use cases must not rely on |
| 48 | QEMU to provide guest isolation or any security guarantees. |
| 49 | |
| 50 | @section Architecture |
| 51 | |
| 52 | This section describes the design principles that ensure the security |
| 53 | requirements are met. |
| 54 | |
| 55 | @subsection Guest Isolation |
| 56 | |
| 57 | Guest isolation is the confinement of guest code to the virtual machine. When |
| 58 | guest code gains control of execution on the host this is called escaping the |
| 59 | virtual machine. Isolation also includes resource limits such as throttling of |
| 60 | CPU, memory, disk, or network. Guests must be unable to exceed their resource |
| 61 | limits. |
| 62 | |
| 63 | QEMU presents an attack surface to the guest in the form of emulated devices. |
| 64 | The guest must not be able to gain control of QEMU. Bugs in emulated devices |
| 65 | could allow malicious guests to gain code execution in QEMU. At this point the |
| 66 | guest has escaped the virtual machine and is able to act in the context of the |
| 67 | QEMU process on the host. |
| 68 | |
| 69 | Guests often interact with other guests and share resources with them. A |
| 70 | malicious guest must not gain control of other guests or access their data. |
| 71 | Disk image files and network traffic must be protected from other guests unless |
| 72 | explicitly shared between them by the user. |
| 73 | |
| 74 | @subsection Principle of Least Privilege |
| 75 | |
| 76 | The principle of least privilege states that each component only has access to |
| 77 | the privileges necessary for its function. In the case of QEMU this means that |
| 78 | each process only has access to resources belonging to the guest. |
| 79 | |
| 80 | The QEMU process should not have access to any resources that are inaccessible |
| 81 | to the guest. This way the guest does not gain anything by escaping into the |
| 82 | QEMU process since it already has access to those same resources from within |
| 83 | the guest. |
| 84 | |
| 85 | Following the principle of least privilege immediately fulfills guest isolation |
| 86 | requirements. For example, guest A only has access to its own disk image file |
| 87 | @code{a.img} and not guest B's disk image file @code{b.img}. |
| 88 | |
| 89 | In reality certain resources are inaccessible to the guest but must be |
| 90 | available to QEMU to perform its function. For example, host system calls are |
| 91 | necessary for QEMU but are not exposed to guests. A guest that escapes into |
| 92 | the QEMU process can then begin invoking host system calls. |
| 93 | |
| 94 | New features must be designed to follow the principle of least privilege. |
| 95 | Should this not be possible for technical reasons, the security risk must be |
| 96 | clearly documented so users are aware of the trade-off of enabling the feature. |
| 97 | |
| 98 | @subsection Isolation mechanisms |
| 99 | |
| 100 | Several isolation mechanisms are available to realize this architecture of |
| 101 | guest isolation and the principle of least privilege. With the exception of |
| 102 | Linux seccomp, these mechanisms are all deployed by management tools that |
| 103 | launch QEMU, such as libvirt. They are also platform-specific so they are only |
| 104 | described briefly for Linux here. |
| 105 | |
| 106 | The fundamental isolation mechanism is that QEMU processes must run as |
| 107 | unprivileged users. Sometimes it seems more convenient to launch QEMU as |
| 108 | root to give it access to host devices (e.g. @code{/dev/net/tun}) but this poses a |
| 109 | huge security risk. File descriptor passing can be used to give an otherwise |
| 110 | unprivileged QEMU process access to host devices without running QEMU as root. |
| 111 | It is also possible to launch QEMU as a non-root user and configure UNIX groups |
| 112 | for access to @code{/dev/kvm}, @code{/dev/net/tun}, and other device nodes. |
| 113 | Some Linux distros already ship with UNIX groups for these devices by default. |
| 114 | |
| 115 | @itemize |
| 116 | @item SELinux and AppArmor make it possible to confine processes beyond the |
| 117 | traditional UNIX process and file permissions model. They restrict the QEMU |
| 118 | process from accessing processes and files on the host system that are not |
| 119 | needed by QEMU. |
| 120 | |
| 121 | @item Resource limits and cgroup controllers provide throughput and utilization |
| 122 | limits on key resources such as CPU time, memory, and I/O bandwidth. |
| 123 | |
| 124 | @item Linux namespaces can be used to make process, file system, and other system |
| 125 | resources unavailable to QEMU. A namespaced QEMU process is restricted to only |
| 126 | those resources that were granted to it. |
| 127 | |
| 128 | @item Linux seccomp is available via the QEMU @option{--sandbox} option. It disables |
| 129 | system calls that are not needed by QEMU, thereby reducing the host kernel |
| 130 | attack surface. |
| 131 | @end itemize |
Daniel P. Berrangé | 4f24430 | 2019-07-03 14:41:35 +0100 | [diff] [blame] | 132 | |
| 133 | @section Sensitive configurations |
| 134 | |
| 135 | There are aspects of QEMU that can have security implications which users & |
| 136 | management applications must be aware of. |
| 137 | |
| 138 | @subsection Monitor console (QMP and HMP) |
| 139 | |
| 140 | The monitor console (whether used with QMP or HMP) provides an interface |
| 141 | to dynamically control many aspects of QEMU's runtime operation. Many of the |
| 142 | commands exposed will instruct QEMU to access content on the host file system |
| 143 | and/or trigger spawning of external processes. |
| 144 | |
| 145 | For example, the @code{migrate} command allows for the spawning of arbitrary |
| 146 | processes for the purpose of tunnelling the migration data stream. The |
| 147 | @code{blockdev-add} command instructs QEMU to open arbitrary files, exposing |
| 148 | their content to the guest as a virtual disk. |
| 149 | |
| 150 | Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor, |
| 151 | or Linux namespaces, the monitor console should be considered to have privileges |
| 152 | equivalent to those of the user account QEMU is running under. |
| 153 | |
| 154 | It is further important to consider the security of the character device backend |
| 155 | over which the monitor console is exposed. It needs to have protection against |
| 156 | malicious third parties which might try to make unauthorized connections, or |
| 157 | perform man-in-the-middle attacks. Many of the character device backends do not |
| 158 | satisfy this requirement and so must not be used for the monitor console. |
| 159 | |
| 160 | The general recommendation is that the monitor console should be exposed over |
| 161 | a UNIX domain socket backend to the local host only. Use of the TCP based |
| 162 | character device backend is inappropriate unless configured to use both TLS |
| 163 | encryption and authorization control policy on client connections. |
| 164 | |
| 165 | In summary, the monitor console is considered a privileged control interface to |
| 166 | QEMU and as such should only be made accessible to a trusted management |
| 167 | application or user. |