blob: 5f763aa6bbfe3700ba1324d04064589c7a2a447b [file] [log] [blame]
Alberto Garcia7f65ce82015-08-04 15:14:41 +03001qcow2 L2/refcount cache configuration
2=====================================
Alberto Garcia30afc122020-07-10 18:12:49 +02003Copyright (C) 2015, 2018-2020 Igalia, S.L.
Alberto Garcia7f65ce82015-08-04 15:14:41 +03004Author: Alberto Garcia <berto@igalia.com>
5
6This work is licensed under the terms of the GNU GPL, version 2 or
7later. See the COPYING file in the top-level directory.
8
9Introduction
10------------
11The QEMU qcow2 driver has two caches that can improve the I/O
12performance significantly. However, setting the right cache sizes is
13not a straightforward operation.
14
15This document attempts to give an overview of the L2 and refcount
16caches, and how to configure them.
17
Philippe Mathieu-Daudéf3fdeb92017-07-28 19:46:02 -030018Please refer to the docs/interop/qcow2.txt file for an in-depth
Alberto Garcia7f65ce82015-08-04 15:14:41 +030019technical description of the qcow2 file format.
20
21
22Clusters
23--------
24A qcow2 file is organized in units of constant size called clusters.
25
26The cluster size is configurable, but it must be a power of two and
27its value 512 bytes or higher. QEMU currently defaults to 64 KB
28clusters, and it does not support sizes larger than 2MB.
29
30The 'qemu-img create' command supports specifying the size using the
31cluster_size option:
32
33 qemu-img create -f qcow2 -o cluster_size=128K hd.qcow2 4G
34
35
36The L2 tables
37-------------
38The qcow2 format uses a two-level structure to map the virtual disk as
39seen by the guest to the disk image in the host. These structures are
40called the L1 and L2 tables.
41
42There is one single L1 table per disk image. The table is small and is
43always kept in memory.
44
45There can be many L2 tables, depending on how much space has been
46allocated in the image. Each table is one cluster in size. In order to
47read or write data from the virtual disk, QEMU needs to read its
48corresponding L2 table to find out where that data is located. Since
49reading the table for each I/O operation can be expensive, QEMU keeps
50an L2 cache in memory to speed up disk access.
51
52The size of the L2 cache can be configured, and setting the right
53value can improve the I/O performance significantly.
54
55
56The refcount blocks
57-------------------
Like Xu806be372019-02-20 13:27:26 +080058The qcow2 format also maintains a reference count for each cluster.
Alberto Garcia7f65ce82015-08-04 15:14:41 +030059Reference counts are used for cluster allocation and internal
60snapshots. The data is stored in a two-level structure similar to the
61L1/L2 tables described above.
62
63The second level structures are called refcount blocks, are also one
64cluster in size and the number is also variable and dependent on the
65amount of allocated space.
66
67Each block contains a number of refcount entries. Their size (in bits)
68is a power of two and must not be higher than 64. It defaults to 16
69bits, but a different value can be set using the refcount_bits option:
70
71 qemu-img create -f qcow2 -o refcount_bits=8 hd.qcow2 4G
72
73QEMU keeps a refcount cache to speed up I/O much like the
74aforementioned L2 cache, and its size can also be configured.
75
76
77Choosing the right cache sizes
78------------------------------
79In order to choose the cache sizes we need to know how they relate to
80the amount of allocated space.
81
Leonid Bloch40fb2152018-09-26 19:04:39 +030082The part of the virtual disk that can be mapped by the L2 and refcount
Alberto Garcia7f65ce82015-08-04 15:14:41 +030083caches (in bytes) is:
84
85 disk_size = l2_cache_size * cluster_size / 8
86 disk_size = refcount_cache_size * cluster_size * 8 / refcount_bits
87
88With the default values for cluster_size (64KB) and refcount_bits
Leonid Bloch40fb2152018-09-26 19:04:39 +030089(16), this becomes:
Alberto Garcia7f65ce82015-08-04 15:14:41 +030090
91 disk_size = l2_cache_size * 8192
92 disk_size = refcount_cache_size * 32768
93
94So in order to cover n GB of disk space with the default values we
95need:
96
97 l2_cache_size = disk_size_GB * 131072
98 refcount_cache_size = disk_size_GB * 32768
99
Leonid Bloch40fb2152018-09-26 19:04:39 +0300100For example, 1MB of L2 cache is needed to cover every 8 GB of the virtual
101image size (given that the default cluster size is used):
Alberto Garcia7f65ce82015-08-04 15:14:41 +0300102
Leonid Bloch40fb2152018-09-26 19:04:39 +0300103 8 GB / 8192 = 1 MB
104
105The refcount cache is 4 times the cluster size by default. With the default
106cluster size of 64 KB, it is 256 KB (262144 bytes). This is sufficient for
1078 GB of image size:
108
109 262144 * 32768 = 8 GB
Alberto Garcia7f65ce82015-08-04 15:14:41 +0300110
111
112How to configure the cache sizes
113--------------------------------
114Cache sizes can be configured using the -drive option in the
115command-line, or the 'blockdev-add' QMP command.
116
117There are three options available, and all of them take bytes:
118
119"l2-cache-size": maximum size of the L2 table cache
120"refcount-cache-size": maximum size of the refcount block cache
121"cache-size": maximum size of both caches combined
122
Alberto Garcia603790e2018-04-17 15:37:05 +0300123There are a few things that need to be taken into account:
Alberto Garcia7f65ce82015-08-04 15:14:41 +0300124
Alberto Garciabe820972018-02-19 16:54:59 +0200125 - Both caches must have a size that is a multiple of the cluster size
126 (or the cache entry size: see "Using smaller cache sizes" below).
Alberto Garcia7f65ce82015-08-04 15:14:41 +0300127
Leonid Bloch80668d02018-09-26 19:04:44 +0300128 - The maximum L2 cache size is 32 MB by default on Linux platforms (enough
129 for full coverage of 256 GB images, with the default cluster size). This
130 value can be modified using the "l2-cache-size" option. QEMU will not use
131 more memory than needed to hold all of the image's L2 tables, regardless
132 of this max. value.
133 On non-Linux platforms the maximal value is smaller by default (8 MB) and
134 this difference stems from the fact that on Linux the cache can be cleared
135 periodically if needed, using the "cache-clean-interval" option (see below).
136 The minimal L2 cache size is 2 clusters (or 2 cache entries, see below).
Alberto Garcia7f65ce82015-08-04 15:14:41 +0300137
Alberto Garcia603790e2018-04-17 15:37:05 +0300138 - The default (and minimum) refcount cache size is 4 clusters.
Alberto Garcia7f65ce82015-08-04 15:14:41 +0300139
Alberto Garcia603790e2018-04-17 15:37:05 +0300140 - If only "cache-size" is specified then QEMU will assign as much
141 memory as possible to the L2 cache before increasing the refcount
142 cache size.
Alberto Garcia7f65ce82015-08-04 15:14:41 +0300143
Leonid Bloch40fb2152018-09-26 19:04:39 +0300144 - At most two of "l2-cache-size", "refcount-cache-size", and "cache-size"
145 can be set simultaneously.
146
Alberto Garcia603790e2018-04-17 15:37:05 +0300147Unlike L2 tables, refcount blocks are not used during normal I/O but
148only during allocations and internal snapshots. In most cases they are
149accessed sequentially (even during random guest I/O) so increasing the
150refcount cache size won't have any measurable effect in performance
151(this can change if you are using internal snapshots, so you may want
152to think about increasing the cache size if you use them heavily).
Alberto Garcia7f65ce82015-08-04 15:14:41 +0300153
Alberto Garcia603790e2018-04-17 15:37:05 +0300154Before QEMU 2.12 the refcount cache had a default size of 1/4 of the
155L2 cache size. This resulted in unnecessarily large caches, so now the
156refcount cache is as small as possible unless overridden by the user.
Alberto Garcia7f65ce82015-08-04 15:14:41 +0300157
158
Alberto Garciabe820972018-02-19 16:54:59 +0200159Using smaller cache entries
160---------------------------
Alberto Garciaaf39bd02019-02-13 18:48:53 +0200161The qcow2 L2 cache can store complete tables. This means that if QEMU
162needs an entry from an L2 table then the whole table is read from disk
163and is kept in the cache. If the cache is full then a complete table
164needs to be evicted first.
Alberto Garciabe820972018-02-19 16:54:59 +0200165
166This can be inefficient with large cluster sizes since it results in
167more disk I/O and wastes more cache memory.
168
169Since QEMU 2.12 you can change the size of the L2 cache entry and make
170it smaller than the cluster size. This can be configured using the
171"l2-cache-entry-size" parameter:
172
173 -drive file=hd.qcow2,l2-cache-size=2097152,l2-cache-entry-size=4096
174
Alberto Garciaaf39bd02019-02-13 18:48:53 +0200175Since QEMU 4.0 the value of l2-cache-entry-size defaults to 4KB (or
176the cluster size if it's smaller).
177
Alberto Garciabe820972018-02-19 16:54:59 +0200178Some things to take into account:
179
180 - The L2 cache entry size has the same restrictions as the cluster
181 size (power of two, at least 512 bytes).
182
183 - Smaller entry sizes generally improve the cache efficiency and make
184 disk I/O faster. This is particularly true with solid state drives
185 so it's a good idea to reduce the entry size in those cases. With
186 rotating hard drives the situation is a bit more complicated so you
187 should test it first and stay with the default size if unsure.
188
189 - Try different entry sizes to see which one gives faster performance
190 in your case. The block size of the host filesystem is generally a
Alberto Garciaaf39bd02019-02-13 18:48:53 +0200191 good default (usually 4096 bytes in the case of ext4, hence the
192 default).
Alberto Garciabe820972018-02-19 16:54:59 +0200193
194 - Only the L2 cache can be configured this way. The refcount cache
195 always uses the cluster size as the entry size.
196
197 - If the L2 cache is big enough to hold all of the image's L2 tables
Leonid Blochb7495622018-09-26 19:04:43 +0300198 (as explained in the "Choosing the right cache sizes" and "How to
199 configure the cache sizes" sections in this document) then none of
200 this is necessary and you can omit the "l2-cache-entry-size"
Alberto Garciaaf39bd02019-02-13 18:48:53 +0200201 parameter altogether. In this case QEMU makes the entry size
202 equal to the cluster size by default.
Alberto Garciabe820972018-02-19 16:54:59 +0200203
204
Alberto Garcia7f65ce82015-08-04 15:14:41 +0300205Reducing the memory usage
206-------------------------
207It is possible to clean unused cache entries in order to reduce the
208memory usage during periods of low I/O activity.
209
Leonid Bloche3a7b452018-09-29 12:54:54 +0300210The parameter "cache-clean-interval" defines an interval (in seconds),
211after which all the cache entries that haven't been accessed during the
212interval are removed from memory. Setting this parameter to 0 disables this
213feature.
Alberto Garcia7f65ce82015-08-04 15:14:41 +0300214
Leonid Bloche3a7b452018-09-29 12:54:54 +0300215The following example removes all unused cache entries every 15 minutes:
Alberto Garcia7f65ce82015-08-04 15:14:41 +0300216
217 -drive file=hd.qcow2,cache-clean-interval=900
218
Leonid Bloche3a7b452018-09-29 12:54:54 +0300219If unset, the default value for this parameter is 600 on platforms which
220support this functionality, and is 0 (disabled) on other platforms.
Alberto Garcia7f65ce82015-08-04 15:14:41 +0300221
Leonid Bloche3a7b452018-09-29 12:54:54 +0300222This functionality currently relies on the MADV_DONTNEED argument for
223madvise() to actually free the memory. This is a Linux-specific feature,
224so cache-clean-interval is not supported on other systems.
Alberto Garcia30afc122020-07-10 18:12:49 +0200225
226
227Extended L2 Entries
228-------------------
229All numbers shown in this document are valid for qcow2 images with normal
23064-bit L2 entries.
231
232Images with extended L2 entries need twice as much L2 metadata, so the L2
233cache size must be twice as large for the same disk space.
234
235 disk_size = l2_cache_size * cluster_size / 16
236
237i.e.
238
239 l2_cache_size = disk_size * 16 / cluster_size
240
241Refcount blocks are not affected by this.