| The QEMU throttling infrastructure | 
 | ================================== | 
 | Copyright (C) 2016,2020 Igalia, S.L. | 
 | Author: Alberto Garcia <berto@igalia.com> | 
 |  | 
 | This work is licensed under the terms of the GNU GPL, version 2 or | 
 | later. See the COPYING file in the top-level directory. | 
 |  | 
 | Introduction | 
 | ------------ | 
 | QEMU includes a throttling module that can be used to set limits to | 
 | I/O operations. The code itself is generic and independent of the I/O | 
 | units, but it is currently used to limit the number of bytes per second | 
 | and operations per second (IOPS) when performing disk I/O. | 
 |  | 
 | This document explains how to use the throttling code in QEMU, and how | 
 | it works internally. The implementation is in throttle.c. | 
 |  | 
 |  | 
 | Using throttling to limit disk I/O | 
 | ---------------------------------- | 
 | Two aspects of the disk I/O can be limited: the number of bytes per | 
 | second and the number of operations per second (IOPS). For each one of | 
 | them the user can set a global limit or separate limits for read and | 
 | write operations. This gives us a total of six different parameters. | 
 |  | 
 | I/O limits can be set using the throttling.* parameters of -drive, or | 
 | using the QMP 'block_set_io_throttle' command. These are the names of | 
 | the parameters for both cases: | 
 |  | 
 | |-----------------------+-----------------------| | 
 | | -drive                | block_set_io_throttle | | 
 | |-----------------------+-----------------------| | 
 | | throttling.iops-total | iops                  | | 
 | | throttling.iops-read  | iops_rd               | | 
 | | throttling.iops-write | iops_wr               | | 
 | | throttling.bps-total  | bps                   | | 
 | | throttling.bps-read   | bps_rd                | | 
 | | throttling.bps-write  | bps_wr                | | 
 | |-----------------------+-----------------------| | 
 |  | 
 | It is possible to set limits for both IOPS and bps at the same time, | 
 | and for each case we can decide whether to have separate read and | 
 | write limits or not, but note that if iops-total is set then neither | 
 | iops-read nor iops-write can be set. The same applies to bps-total and | 
 | bps-read/write. | 
 |  | 
 | The default value of these parameters is 0, and it means 'unlimited'. | 
 |  | 
 | In its most basic usage, the user can add a drive to QEMU with a limit | 
 | of 100 IOPS with the following -drive line: | 
 |  | 
 |    -drive file=hd0.qcow2,throttling.iops-total=100 | 
 |  | 
 | We can do the same using QMP. In this case all these parameters are | 
 | mandatory, so we must set to 0 the ones that we don't want to limit: | 
 |  | 
 |    { "execute": "block_set_io_throttle", | 
 |      "arguments": { | 
 |         "device": "virtio0", | 
 |         "iops": 100, | 
 |         "iops_rd": 0, | 
 |         "iops_wr": 0, | 
 |         "bps": 0, | 
 |         "bps_rd": 0, | 
 |         "bps_wr": 0 | 
 |      } | 
 |    } | 
 |  | 
 |  | 
 | I/O bursts | 
 | ---------- | 
 | In addition to the basic limits we have just seen, QEMU allows the | 
 | user to do bursts of I/O for a configurable amount of time. A burst is | 
 | an amount of I/O that can exceed the basic limit. Bursts are useful to | 
 | allow better performance when there are peaks of activity (the OS | 
 | boots, a service needs to be restarted) while keeping the average | 
 | limits lower the rest of the time. | 
 |  | 
 | Two parameters control bursts: their length and the maximum amount of | 
 | I/O they allow. These two can be configured separately for each one of | 
 | the six basic parameters described in the previous section, but in | 
 | this section we'll use 'iops-total' as an example. | 
 |  | 
 | The I/O limit during bursts is set using 'iops-total-max', and the | 
 | maximum length (in seconds) is set with 'iops-total-max-length'. So if | 
 | we want to configure a drive with a basic limit of 100 IOPS and allow | 
 | bursts of 2000 IOPS for 60 seconds, we would do it like this (the line | 
 | is split for clarity): | 
 |  | 
 |    -drive file=hd0.qcow2, | 
 |           throttling.iops-total=100, | 
 |           throttling.iops-total-max=2000, | 
 |           throttling.iops-total-max-length=60 | 
 |  | 
 | Or, with QMP: | 
 |  | 
 |    { "execute": "block_set_io_throttle", | 
 |      "arguments": { | 
 |         "device": "virtio0", | 
 |         "iops": 100, | 
 |         "iops_rd": 0, | 
 |         "iops_wr": 0, | 
 |         "bps": 0, | 
 |         "bps_rd": 0, | 
 |         "bps_wr": 0, | 
 |         "iops_max": 2000, | 
 |         "iops_max_length": 60, | 
 |      } | 
 |    } | 
 |  | 
 | With this, the user can perform I/O on hd0.qcow2 at a rate of 2000 | 
 | IOPS for 1 minute before it's throttled down to 100 IOPS. | 
 |  | 
 | The user will be able to do bursts again if there's a sufficiently | 
 | long period of time with unused I/O (see below for details). | 
 |  | 
 | The default value for 'iops-total-max' is 0 and it means that bursts | 
 | are not allowed. 'iops-total-max-length' can only be set if | 
 | 'iops-total-max' is set as well, and its default value is 1 second. | 
 |  | 
 | Here's the complete list of parameters for configuring bursts: | 
 |  | 
 | |----------------------------------+-----------------------| | 
 | | -drive                           | block_set_io_throttle | | 
 | |----------------------------------+-----------------------| | 
 | | throttling.iops-total-max        | iops_max              | | 
 | | throttling.iops-total-max-length | iops_max_length       | | 
 | | throttling.iops-read-max         | iops_rd_max           | | 
 | | throttling.iops-read-max-length  | iops_rd_max_length    | | 
 | | throttling.iops-write-max        | iops_wr_max           | | 
 | | throttling.iops-write-max-length | iops_wr_max_length    | | 
 | | throttling.bps-total-max         | bps_max               | | 
 | | throttling.bps-total-max-length  | bps_max_length        | | 
 | | throttling.bps-read-max          | bps_rd_max            | | 
 | | throttling.bps-read-max-length   | bps_rd_max_length     | | 
 | | throttling.bps-write-max         | bps_wr_max            | | 
 | | throttling.bps-write-max-length  | bps_wr_max_length     | | 
 | |----------------------------------+-----------------------| | 
 |  | 
 |  | 
 | Controlling the size of I/O operations | 
 | -------------------------------------- | 
 | When applying IOPS limits all I/O operations are treated equally | 
 | regardless of their size. This means that the user can take advantage | 
 | of this in order to circumvent the limits and submit one huge I/O | 
 | request instead of several smaller ones. | 
 |  | 
 | QEMU provides a setting called throttling.iops-size to prevent this | 
 | from happening. This setting specifies the size (in bytes) of an I/O | 
 | request for accounting purposes. Larger requests will be counted | 
 | proportionally to this size. | 
 |  | 
 | For example, if iops-size is set to 4096 then an 8KB request will be | 
 | counted as two, and a 6KB request will be counted as one and a | 
 | half. This only applies to requests larger than iops-size: smaller | 
 | requests will be always counted as one, no matter their size. | 
 |  | 
 | The default value of iops-size is 0 and it means that the size of the | 
 | requests is never taken into account when applying IOPS limits. | 
 |  | 
 |  | 
 | Applying I/O limits to groups of disks | 
 | -------------------------------------- | 
 | In all the examples so far we have seen how to apply limits to the I/O | 
 | performed on individual drives, but QEMU allows grouping drives so | 
 | they all share the same limits. | 
 |  | 
 | The way it works is that each drive with I/O limits is assigned to a | 
 | group named using the throttling.group parameter. If this parameter is | 
 | not specified, then the device name (i.e. 'virtio0', 'ide0-hd0') will | 
 | be used as the group name. | 
 |  | 
 | Limits set using the throttling.* parameters discussed earlier in this | 
 | document apply to the combined I/O of all members of a group. | 
 |  | 
 | Consider this example: | 
 |  | 
 |    -drive file=hd1.qcow2,throttling.iops-total=6000,throttling.group=foo | 
 |    -drive file=hd2.qcow2,throttling.iops-total=6000,throttling.group=foo | 
 |    -drive file=hd3.qcow2,throttling.iops-total=3000,throttling.group=bar | 
 |    -drive file=hd4.qcow2,throttling.iops-total=6000,throttling.group=foo | 
 |    -drive file=hd5.qcow2,throttling.iops-total=3000,throttling.group=bar | 
 |    -drive file=hd6.qcow2,throttling.iops-total=5000 | 
 |  | 
 | Here hd1, hd2 and hd4 are all members of a group named 'foo' with a | 
 | combined IOPS limit of 6000, and hd3 and hd5 are members of 'bar'. hd6 | 
 | is left alone (technically it is part of a 1-member group). | 
 |  | 
 | Limits are applied in a round-robin fashion so if there are concurrent | 
 | I/O requests on several drives of the same group they will be | 
 | distributed evenly. | 
 |  | 
 | When I/O limits are applied to an existing drive using the QMP command | 
 | 'block_set_io_throttle', the following things need to be taken into | 
 | account: | 
 |  | 
 |    - I/O limits are shared within the same group, so new values will | 
 |      affect all members and overwrite the previous settings. In other | 
 |      words: if different limits are applied to members of the same | 
 |      group, the last one wins. | 
 |  | 
 |    - If 'group' is unset it is assumed to be the current group of that | 
 |      drive. If the drive is not in a group yet, it will be added to a | 
 |      group named after the device name. | 
 |  | 
 |    - If 'group' is set then the drive will be moved to that group if | 
 |      it was member of a different one. In this case the limits | 
 |      specified in the parameters will be applied to the new group | 
 |      only. | 
 |  | 
 |    - I/O limits can be disabled by setting all of them to 0. In this | 
 |      case the device will be removed from its group and the rest of | 
 |      its members will not be affected. The 'group' parameter is | 
 |      ignored. | 
 |  | 
 |  | 
 | The Leaky Bucket algorithm | 
 | -------------------------- | 
 | I/O limits in QEMU are implemented using the leaky bucket algorithm | 
 | (specifically the "Leaky bucket as a meter" variant). | 
 |  | 
 | This algorithm uses the analogy of a bucket that leaks water | 
 | constantly. The water that gets into the bucket represents the I/O | 
 | that has been performed, and no more I/O is allowed once the bucket is | 
 | full. | 
 |  | 
 | To see the way this corresponds to the throttling parameters in QEMU, | 
 | consider the following values: | 
 |  | 
 |   iops-total=100 | 
 |   iops-total-max=2000 | 
 |   iops-total-max-length=60 | 
 |  | 
 |   - Water leaks from the bucket at a rate of 100 IOPS. | 
 |   - Water can be added to the bucket at a rate of 2000 IOPS. | 
 |   - The size of the bucket is 2000 x 60 = 120000 | 
 |   - If 'iops-total-max-length' is unset then it defaults to 1 and the | 
 |     size of the bucket is 2000. | 
 |   - If 'iops-total-max' is unset then 'iops-total-max-length' must be | 
 |     unset as well. In this case the bucket size is 100. | 
 |  | 
 | The bucket is initially empty, therefore water can be added until it's | 
 | full at a rate of 2000 IOPS (the burst rate). Once the bucket is full | 
 | we can only add as much water as it leaks, therefore the I/O rate is | 
 | reduced to 100 IOPS. If we add less water than it leaks then the | 
 | bucket will start to empty, allowing for bursts again. | 
 |  | 
 | Note that since water is leaking from the bucket even during bursts, | 
 | it will take a bit more than 60 seconds at 2000 IOPS to fill it | 
 | up. After those 60 seconds the bucket will have leaked 60 x 100 = | 
 | 6000, allowing for 3 more seconds of I/O at 2000 IOPS. | 
 |  | 
 | Also, due to the way the algorithm works, longer burst can be done at | 
 | a lower I/O rate, e.g. 1000 IOPS during 120 seconds. | 
 |  | 
 |  | 
 | The 'throttle' block filter | 
 | --------------------------- | 
 | Since QEMU 2.11 it is possible to configure the I/O limits using a | 
 | 'throttle' block filter. This filter uses the exact same throttling | 
 | infrastructure described above but can be used anywhere in the node | 
 | graph, allowing for more flexibility. | 
 |  | 
 | The user can create an arbitrary number of filters and each one of | 
 | them must be assigned to a group that contains the actual I/O limits. | 
 | Different filters can use the same group so the limits are shared as | 
 | described earlier in "Applying I/O limits to groups of disks". | 
 |  | 
 | A group can be created using the object-add QMP function: | 
 |  | 
 |    { "execute": "object-add", | 
 |      "arguments": { | 
 |        "qom-type": "throttle-group", | 
 |        "id": "group0", | 
 |        "limits" : { | 
 |          "iops-total": 1000, | 
 |          "bps-write": 2097152 | 
 |        } | 
 |      } | 
 |    } | 
 |  | 
 | throttle-group has a 'limits' property (of type ThrottleLimits as | 
 | defined in qapi/block-core.json) which can be set on creation or later | 
 | with 'qom-set'. | 
 |  | 
 | A throttle-group can also be created with the -object command line | 
 | option but at the moment there is no way to pass a 'limits' parameter | 
 | that contains a ThrottleLimits structure. The solution is to set the | 
 | individual values directly, like in this example: | 
 |  | 
 |    -object throttle-group,id=group0,x-iops-total=1000,x-bps-write=2097152 | 
 |  | 
 | Note however that this is not a stable API (hence the 'x-' prefixes) and | 
 | will disappear when -object gains support for structured options and | 
 | enables use of 'limits'. | 
 |  | 
 | Once we have a throttle-group we can use the throttle block filter, | 
 | where the 'file' property must be set to the block device that we want | 
 | to filter: | 
 |  | 
 |    { "execute": "blockdev-add", | 
 |      "arguments": { | 
 |         "options":  { | 
 |            "driver": "qcow2", | 
 |            "node-name": "disk0", | 
 |            "file": { | 
 |               "driver": "file", | 
 |               "filename": "/path/to/disk.qcow2" | 
 |            } | 
 |         } | 
 |      } | 
 |    } | 
 |  | 
 |    { "execute": "blockdev-add", | 
 |      "arguments": { | 
 |         "driver": "throttle", | 
 |         "node-name": "throttle0", | 
 |         "throttle-group": "group0", | 
 |         "file": "disk0" | 
 |      } | 
 |    } | 
 |  | 
 | A similar setup can also be done with the command line, for example: | 
 |  | 
 |    -drive driver=throttle,throttle-group=group0, | 
 |           file.driver=qcow2,file.file.filename=/path/to/disk.qcow2 | 
 |  | 
 | The scenario described so far is very simple but the throttle block | 
 | filter allows for more complex configurations. For example, let's say | 
 | that we have three different drives and we want to set I/O limits for | 
 | each one of them and an additional set of limits for the combined I/O | 
 | of all three drives. | 
 |  | 
 | First we would define all throttle groups, one for each one of the | 
 | drives and one that would apply to all of them: | 
 |  | 
 |    -object throttle-group,id=limits0,x-iops-total=2000 | 
 |    -object throttle-group,id=limits1,x-iops-total=2500 | 
 |    -object throttle-group,id=limits2,x-iops-total=3000 | 
 |    -object throttle-group,id=limits012,x-iops-total=4000 | 
 |  | 
 | Now we can define the drives, and for each one of them we use two | 
 | chained throttle filters: the drive's own filter and the combined | 
 | filter. | 
 |  | 
 |    -drive driver=throttle,throttle-group=limits012, | 
 |           file.driver=throttle,file.throttle-group=limits0 | 
 |           file.file.driver=qcow2,file.file.file.filename=/path/to/disk0.qcow2 | 
 |    -drive driver=throttle,throttle-group=limits012, | 
 |           file.driver=throttle,file.throttle-group=limits1 | 
 |           file.file.driver=qcow2,file.file.file.filename=/path/to/disk1.qcow2 | 
 |    -drive driver=throttle,throttle-group=limits012, | 
 |           file.driver=throttle,file.throttle-group=limits2 | 
 |           file.file.driver=qcow2,file.file.file.filename=/path/to/disk2.qcow2 | 
 |  | 
 | In this example the individual drives have IOPS limits of 2000, 2500 | 
 | and 3000 respectively but the total combined I/O can never exceed 4000 | 
 | IOPS. |