Alberto Garcia | 1ffad77 | 2016-02-18 12:27:09 +0200 | [diff] [blame] | 1 | The QEMU throttling infrastructure |
| 2 | ================================== |
Alberto Garcia | 8e7b122 | 2020-09-21 19:30:16 +0200 | [diff] [blame] | 3 | Copyright (C) 2016,2020 Igalia, S.L. |
Alberto Garcia | 1ffad77 | 2016-02-18 12:27:09 +0200 | [diff] [blame] | 4 | Author: Alberto Garcia <berto@igalia.com> |
| 5 | |
| 6 | This work is licensed under the terms of the GNU GPL, version 2 or |
| 7 | later. See the COPYING file in the top-level directory. |
| 8 | |
| 9 | Introduction |
| 10 | ------------ |
| 11 | QEMU includes a throttling module that can be used to set limits to |
| 12 | I/O operations. The code itself is generic and independent of the I/O |
Stefan Weil | cb8d4c8 | 2016-03-23 15:59:57 +0100 | [diff] [blame] | 13 | units, but it is currently used to limit the number of bytes per second |
Alberto Garcia | 1ffad77 | 2016-02-18 12:27:09 +0200 | [diff] [blame] | 14 | and operations per second (IOPS) when performing disk I/O. |
| 15 | |
| 16 | This document explains how to use the throttling code in QEMU, and how |
| 17 | it works internally. The implementation is in throttle.c. |
| 18 | |
| 19 | |
| 20 | Using throttling to limit disk I/O |
| 21 | ---------------------------------- |
| 22 | Two aspects of the disk I/O can be limited: the number of bytes per |
| 23 | second and the number of operations per second (IOPS). For each one of |
| 24 | them the user can set a global limit or separate limits for read and |
| 25 | write operations. This gives us a total of six different parameters. |
| 26 | |
| 27 | I/O limits can be set using the throttling.* parameters of -drive, or |
| 28 | using the QMP 'block_set_io_throttle' command. These are the names of |
| 29 | the parameters for both cases: |
| 30 | |
| 31 | |-----------------------+-----------------------| |
| 32 | | -drive | block_set_io_throttle | |
| 33 | |-----------------------+-----------------------| |
| 34 | | throttling.iops-total | iops | |
| 35 | | throttling.iops-read | iops_rd | |
| 36 | | throttling.iops-write | iops_wr | |
| 37 | | throttling.bps-total | bps | |
| 38 | | throttling.bps-read | bps_rd | |
| 39 | | throttling.bps-write | bps_wr | |
| 40 | |-----------------------+-----------------------| |
| 41 | |
Alberto Garcia | 0bab0eb | 2016-05-24 13:59:13 +0200 | [diff] [blame] | 42 | It is possible to set limits for both IOPS and bps at the same time, |
Alberto Garcia | 1ffad77 | 2016-02-18 12:27:09 +0200 | [diff] [blame] | 43 | and for each case we can decide whether to have separate read and |
| 44 | write limits or not, but note that if iops-total is set then neither |
| 45 | iops-read nor iops-write can be set. The same applies to bps-total and |
| 46 | bps-read/write. |
| 47 | |
| 48 | The default value of these parameters is 0, and it means 'unlimited'. |
| 49 | |
| 50 | In its most basic usage, the user can add a drive to QEMU with a limit |
| 51 | of 100 IOPS with the following -drive line: |
| 52 | |
| 53 | -drive file=hd0.qcow2,throttling.iops-total=100 |
| 54 | |
| 55 | We can do the same using QMP. In this case all these parameters are |
| 56 | mandatory, so we must set to 0 the ones that we don't want to limit: |
| 57 | |
| 58 | { "execute": "block_set_io_throttle", |
| 59 | "arguments": { |
| 60 | "device": "virtio0", |
| 61 | "iops": 100, |
| 62 | "iops_rd": 0, |
| 63 | "iops_wr": 0, |
| 64 | "bps": 0, |
| 65 | "bps_rd": 0, |
| 66 | "bps_wr": 0 |
| 67 | } |
| 68 | } |
| 69 | |
| 70 | |
| 71 | I/O bursts |
| 72 | ---------- |
| 73 | In addition to the basic limits we have just seen, QEMU allows the |
| 74 | user to do bursts of I/O for a configurable amount of time. A burst is |
| 75 | an amount of I/O that can exceed the basic limit. Bursts are useful to |
| 76 | allow better performance when there are peaks of activity (the OS |
| 77 | boots, a service needs to be restarted) while keeping the average |
| 78 | limits lower the rest of the time. |
| 79 | |
| 80 | Two parameters control bursts: their length and the maximum amount of |
| 81 | I/O they allow. These two can be configured separately for each one of |
| 82 | the six basic parameters described in the previous section, but in |
| 83 | this section we'll use 'iops-total' as an example. |
| 84 | |
| 85 | The I/O limit during bursts is set using 'iops-total-max', and the |
| 86 | maximum length (in seconds) is set with 'iops-total-max-length'. So if |
| 87 | we want to configure a drive with a basic limit of 100 IOPS and allow |
| 88 | bursts of 2000 IOPS for 60 seconds, we would do it like this (the line |
| 89 | is split for clarity): |
| 90 | |
| 91 | -drive file=hd0.qcow2, |
| 92 | throttling.iops-total=100, |
| 93 | throttling.iops-total-max=2000, |
| 94 | throttling.iops-total-max-length=60 |
| 95 | |
| 96 | Or, with QMP: |
| 97 | |
| 98 | { "execute": "block_set_io_throttle", |
| 99 | "arguments": { |
| 100 | "device": "virtio0", |
| 101 | "iops": 100, |
| 102 | "iops_rd": 0, |
| 103 | "iops_wr": 0, |
| 104 | "bps": 0, |
| 105 | "bps_rd": 0, |
| 106 | "bps_wr": 0, |
| 107 | "iops_max": 2000, |
| 108 | "iops_max_length": 60, |
| 109 | } |
| 110 | } |
| 111 | |
| 112 | With this, the user can perform I/O on hd0.qcow2 at a rate of 2000 |
| 113 | IOPS for 1 minute before it's throttled down to 100 IOPS. |
| 114 | |
| 115 | The user will be able to do bursts again if there's a sufficiently |
| 116 | long period of time with unused I/O (see below for details). |
| 117 | |
| 118 | The default value for 'iops-total-max' is 0 and it means that bursts |
| 119 | are not allowed. 'iops-total-max-length' can only be set if |
| 120 | 'iops-total-max' is set as well, and its default value is 1 second. |
| 121 | |
| 122 | Here's the complete list of parameters for configuring bursts: |
| 123 | |
| 124 | |----------------------------------+-----------------------| |
| 125 | | -drive | block_set_io_throttle | |
| 126 | |----------------------------------+-----------------------| |
| 127 | | throttling.iops-total-max | iops_max | |
| 128 | | throttling.iops-total-max-length | iops_max_length | |
| 129 | | throttling.iops-read-max | iops_rd_max | |
| 130 | | throttling.iops-read-max-length | iops_rd_max_length | |
| 131 | | throttling.iops-write-max | iops_wr_max | |
| 132 | | throttling.iops-write-max-length | iops_wr_max_length | |
| 133 | | throttling.bps-total-max | bps_max | |
| 134 | | throttling.bps-total-max-length | bps_max_length | |
| 135 | | throttling.bps-read-max | bps_rd_max | |
| 136 | | throttling.bps-read-max-length | bps_rd_max_length | |
| 137 | | throttling.bps-write-max | bps_wr_max | |
| 138 | | throttling.bps-write-max-length | bps_wr_max_length | |
| 139 | |----------------------------------+-----------------------| |
| 140 | |
| 141 | |
| 142 | Controlling the size of I/O operations |
| 143 | -------------------------------------- |
| 144 | When applying IOPS limits all I/O operations are treated equally |
| 145 | regardless of their size. This means that the user can take advantage |
| 146 | of this in order to circumvent the limits and submit one huge I/O |
| 147 | request instead of several smaller ones. |
| 148 | |
| 149 | QEMU provides a setting called throttling.iops-size to prevent this |
| 150 | from happening. This setting specifies the size (in bytes) of an I/O |
| 151 | request for accounting purposes. Larger requests will be counted |
| 152 | proportionally to this size. |
| 153 | |
| 154 | For example, if iops-size is set to 4096 then an 8KB request will be |
| 155 | counted as two, and a 6KB request will be counted as one and a |
| 156 | half. This only applies to requests larger than iops-size: smaller |
| 157 | requests will be always counted as one, no matter their size. |
| 158 | |
| 159 | The default value of iops-size is 0 and it means that the size of the |
| 160 | requests is never taken into account when applying IOPS limits. |
| 161 | |
| 162 | |
| 163 | Applying I/O limits to groups of disks |
| 164 | -------------------------------------- |
| 165 | In all the examples so far we have seen how to apply limits to the I/O |
| 166 | performed on individual drives, but QEMU allows grouping drives so |
| 167 | they all share the same limits. |
| 168 | |
| 169 | The way it works is that each drive with I/O limits is assigned to a |
| 170 | group named using the throttling.group parameter. If this parameter is |
| 171 | not specified, then the device name (i.e. 'virtio0', 'ide0-hd0') will |
| 172 | be used as the group name. |
| 173 | |
| 174 | Limits set using the throttling.* parameters discussed earlier in this |
| 175 | document apply to the combined I/O of all members of a group. |
| 176 | |
| 177 | Consider this example: |
| 178 | |
| 179 | -drive file=hd1.qcow2,throttling.iops-total=6000,throttling.group=foo |
| 180 | -drive file=hd2.qcow2,throttling.iops-total=6000,throttling.group=foo |
| 181 | -drive file=hd3.qcow2,throttling.iops-total=3000,throttling.group=bar |
| 182 | -drive file=hd4.qcow2,throttling.iops-total=6000,throttling.group=foo |
| 183 | -drive file=hd5.qcow2,throttling.iops-total=3000,throttling.group=bar |
| 184 | -drive file=hd6.qcow2,throttling.iops-total=5000 |
| 185 | |
| 186 | Here hd1, hd2 and hd4 are all members of a group named 'foo' with a |
| 187 | combined IOPS limit of 6000, and hd3 and hd5 are members of 'bar'. hd6 |
| 188 | is left alone (technically it is part of a 1-member group). |
| 189 | |
| 190 | Limits are applied in a round-robin fashion so if there are concurrent |
| 191 | I/O requests on several drives of the same group they will be |
| 192 | distributed evenly. |
| 193 | |
| 194 | When I/O limits are applied to an existing drive using the QMP command |
| 195 | 'block_set_io_throttle', the following things need to be taken into |
| 196 | account: |
| 197 | |
| 198 | - I/O limits are shared within the same group, so new values will |
| 199 | affect all members and overwrite the previous settings. In other |
| 200 | words: if different limits are applied to members of the same |
| 201 | group, the last one wins. |
| 202 | |
| 203 | - If 'group' is unset it is assumed to be the current group of that |
| 204 | drive. If the drive is not in a group yet, it will be added to a |
| 205 | group named after the device name. |
| 206 | |
| 207 | - If 'group' is set then the drive will be moved to that group if |
| 208 | it was member of a different one. In this case the limits |
| 209 | specified in the parameters will be applied to the new group |
| 210 | only. |
| 211 | |
| 212 | - I/O limits can be disabled by setting all of them to 0. In this |
| 213 | case the device will be removed from its group and the rest of |
| 214 | its members will not be affected. The 'group' parameter is |
| 215 | ignored. |
| 216 | |
| 217 | |
| 218 | The Leaky Bucket algorithm |
| 219 | -------------------------- |
| 220 | I/O limits in QEMU are implemented using the leaky bucket algorithm |
| 221 | (specifically the "Leaky bucket as a meter" variant). |
| 222 | |
| 223 | This algorithm uses the analogy of a bucket that leaks water |
| 224 | constantly. The water that gets into the bucket represents the I/O |
| 225 | that has been performed, and no more I/O is allowed once the bucket is |
| 226 | full. |
| 227 | |
| 228 | To see the way this corresponds to the throttling parameters in QEMU, |
| 229 | consider the following values: |
| 230 | |
| 231 | iops-total=100 |
| 232 | iops-total-max=2000 |
| 233 | iops-total-max-length=60 |
| 234 | |
| 235 | - Water leaks from the bucket at a rate of 100 IOPS. |
| 236 | - Water can be added to the bucket at a rate of 2000 IOPS. |
| 237 | - The size of the bucket is 2000 x 60 = 120000 |
Alberto Garcia | 37e3645 | 2016-06-08 13:54:48 +0300 | [diff] [blame] | 238 | - If 'iops-total-max-length' is unset then it defaults to 1 and the |
| 239 | size of the bucket is 2000. |
| 240 | - If 'iops-total-max' is unset then 'iops-total-max-length' must be |
| 241 | unset as well. In this case the bucket size is 100. |
Alberto Garcia | 1ffad77 | 2016-02-18 12:27:09 +0200 | [diff] [blame] | 242 | |
| 243 | The bucket is initially empty, therefore water can be added until it's |
| 244 | full at a rate of 2000 IOPS (the burst rate). Once the bucket is full |
| 245 | we can only add as much water as it leaks, therefore the I/O rate is |
| 246 | reduced to 100 IOPS. If we add less water than it leaks then the |
| 247 | bucket will start to empty, allowing for bursts again. |
| 248 | |
| 249 | Note that since water is leaking from the bucket even during bursts, |
| 250 | it will take a bit more than 60 seconds at 2000 IOPS to fill it |
| 251 | up. After those 60 seconds the bucket will have leaked 60 x 100 = |
| 252 | 6000, allowing for 3 more seconds of I/O at 2000 IOPS. |
| 253 | |
| 254 | Also, due to the way the algorithm works, longer burst can be done at |
| 255 | a lower I/O rate, e.g. 1000 IOPS during 120 seconds. |
Alberto Garcia | 8e7b122 | 2020-09-21 19:30:16 +0200 | [diff] [blame] | 256 | |
| 257 | |
| 258 | The 'throttle' block filter |
| 259 | --------------------------- |
| 260 | Since QEMU 2.11 it is possible to configure the I/O limits using a |
| 261 | 'throttle' block filter. This filter uses the exact same throttling |
| 262 | infrastructure described above but can be used anywhere in the node |
| 263 | graph, allowing for more flexibility. |
| 264 | |
| 265 | The user can create an arbitrary number of filters and each one of |
| 266 | them must be assigned to a group that contains the actual I/O limits. |
| 267 | Different filters can use the same group so the limits are shared as |
| 268 | described earlier in "Applying I/O limits to groups of disks". |
| 269 | |
| 270 | A group can be created using the object-add QMP function: |
| 271 | |
| 272 | { "execute": "object-add", |
| 273 | "arguments": { |
| 274 | "qom-type": "throttle-group", |
| 275 | "id": "group0", |
Rao, Lei | 8f75cae | 2021-11-22 15:49:46 +0800 | [diff] [blame] | 276 | "limits" : { |
| 277 | "iops-total": 1000, |
| 278 | "bps-write": 2097152 |
Alberto Garcia | 8e7b122 | 2020-09-21 19:30:16 +0200 | [diff] [blame] | 279 | } |
| 280 | } |
| 281 | } |
| 282 | |
| 283 | throttle-group has a 'limits' property (of type ThrottleLimits as |
| 284 | defined in qapi/block-core.json) which can be set on creation or later |
| 285 | with 'qom-set'. |
| 286 | |
| 287 | A throttle-group can also be created with the -object command line |
| 288 | option but at the moment there is no way to pass a 'limits' parameter |
| 289 | that contains a ThrottleLimits structure. The solution is to set the |
| 290 | individual values directly, like in this example: |
| 291 | |
| 292 | -object throttle-group,id=group0,x-iops-total=1000,x-bps-write=2097152 |
| 293 | |
| 294 | Note however that this is not a stable API (hence the 'x-' prefixes) and |
| 295 | will disappear when -object gains support for structured options and |
| 296 | enables use of 'limits'. |
| 297 | |
| 298 | Once we have a throttle-group we can use the throttle block filter, |
| 299 | where the 'file' property must be set to the block device that we want |
| 300 | to filter: |
| 301 | |
| 302 | { "execute": "blockdev-add", |
| 303 | "arguments": { |
| 304 | "options": { |
| 305 | "driver": "qcow2", |
| 306 | "node-name": "disk0", |
| 307 | "file": { |
| 308 | "driver": "file", |
| 309 | "filename": "/path/to/disk.qcow2" |
| 310 | } |
| 311 | } |
| 312 | } |
| 313 | } |
| 314 | |
| 315 | { "execute": "blockdev-add", |
| 316 | "arguments": { |
| 317 | "driver": "throttle", |
| 318 | "node-name": "throttle0", |
| 319 | "throttle-group": "group0", |
| 320 | "file": "disk0" |
| 321 | } |
| 322 | } |
| 323 | |
| 324 | A similar setup can also be done with the command line, for example: |
| 325 | |
| 326 | -drive driver=throttle,throttle-group=group0, |
| 327 | file.driver=qcow2,file.file.filename=/path/to/disk.qcow2 |
| 328 | |
| 329 | The scenario described so far is very simple but the throttle block |
| 330 | filter allows for more complex configurations. For example, let's say |
| 331 | that we have three different drives and we want to set I/O limits for |
| 332 | each one of them and an additional set of limits for the combined I/O |
| 333 | of all three drives. |
| 334 | |
| 335 | First we would define all throttle groups, one for each one of the |
| 336 | drives and one that would apply to all of them: |
| 337 | |
| 338 | -object throttle-group,id=limits0,x-iops-total=2000 |
| 339 | -object throttle-group,id=limits1,x-iops-total=2500 |
| 340 | -object throttle-group,id=limits2,x-iops-total=3000 |
| 341 | -object throttle-group,id=limits012,x-iops-total=4000 |
| 342 | |
| 343 | Now we can define the drives, and for each one of them we use two |
| 344 | chained throttle filters: the drive's own filter and the combined |
| 345 | filter. |
| 346 | |
| 347 | -drive driver=throttle,throttle-group=limits012, |
| 348 | file.driver=throttle,file.throttle-group=limits0 |
| 349 | file.file.driver=qcow2,file.file.file.filename=/path/to/disk0.qcow2 |
| 350 | -drive driver=throttle,throttle-group=limits012, |
| 351 | file.driver=throttle,file.throttle-group=limits1 |
| 352 | file.file.driver=qcow2,file.file.file.filename=/path/to/disk1.qcow2 |
| 353 | -drive driver=throttle,throttle-group=limits012, |
| 354 | file.driver=throttle,file.throttle-group=limits2 |
| 355 | file.file.driver=qcow2,file.file.file.filename=/path/to/disk2.qcow2 |
| 356 | |
| 357 | In this example the individual drives have IOPS limits of 2000, 2500 |
| 358 | and 3000 respectively but the total combined I/O can never exceed 4000 |
| 359 | IOPS. |