| .. |
| Copyright (C) 2017 Red Hat Inc. |
| |
| This work is licensed under the terms of the GNU GPL, version 2 or |
| later. See the COPYING file in the top-level directory. |
| |
| .. _Live Block Operations: |
| |
| ============================ |
| Live Block Device Operations |
| ============================ |
| |
| QEMU Block Layer currently (as of QEMU 2.9) supports four major kinds of |
| live block device jobs -- stream, commit, mirror, and backup. These can |
| be used to manipulate disk image chains to accomplish certain tasks, |
| namely: live copy data from backing files into overlays; shorten long |
| disk image chains by merging data from overlays into backing files; live |
| synchronize data from a disk image chain (including current active disk) |
| to another target image; and point-in-time (and incremental) backups of |
| a block device. Below is a description of the said block (QMP) |
| primitives, and some (non-exhaustive list of) examples to illustrate |
| their use. |
| |
| .. note:: |
| The file ``qapi/block-core.json`` in the QEMU source tree has the |
| canonical QEMU API (QAPI) schema documentation for the QMP |
| primitives discussed here. |
| |
| .. todo (kashyapc):: Remove the ".. contents::" directive when Sphinx is |
| integrated. |
| |
| .. contents:: |
| |
| Disk image backing chain notation |
| --------------------------------- |
| |
| A simple disk image chain. (This can be created live using QMP |
| ``blockdev-snapshot-sync``, or offline via ``qemu-img``):: |
| |
| (Live QEMU) |
| | |
| . |
| V |
| |
| [A] <----- [B] |
| |
| (backing file) (overlay) |
| |
| The arrow can be read as: Image [A] is the backing file of disk image |
| [B]. And live QEMU is currently writing to image [B], consequently, it |
| is also referred to as the "active layer". |
| |
| There are two kinds of terminology that are common when referring to |
| files in a disk image backing chain: |
| |
| (1) Directional: 'base' and 'top'. Given the simple disk image chain |
| above, image [A] can be referred to as 'base', and image [B] as |
| 'top'. (This terminology can be seen in the QAPI schema file, |
| block-core.json.) |
| |
| (2) Relational: 'backing file' and 'overlay'. Again, taking the same |
| simple disk image chain from the above, disk image [A] is referred |
| to as the backing file, and image [B] as overlay. |
| |
| Throughout this document, we will use the relational terminology. |
| |
| .. important:: |
| The overlay files can generally be any format that supports a |
| backing file, although QCOW2 is the preferred format and the one |
| used in this document. |
| |
| |
| Brief overview of live block QMP primitives |
| ------------------------------------------- |
| |
| The following are the four different kinds of live block operations that |
| QEMU block layer supports. |
| |
| (1) ``block-stream``: Live copy of data from backing files into overlay |
| files. |
| |
| .. note:: Once the 'stream' operation has finished, three things to |
| note: |
| |
| (a) QEMU rewrites the backing chain to remove |
| reference to the now-streamed and redundant backing |
| file; |
| |
| (b) the streamed file *itself* won't be removed by QEMU, |
| and must be explicitly discarded by the user; |
| |
| (c) the streamed file remains valid -- i.e. further |
| overlays can be created based on it. Refer the |
| ``block-stream`` section further below for more |
| details. |
| |
| (2) ``block-commit``: Live merge of data from overlay files into backing |
| files (with the optional goal of removing the overlay file from the |
| chain). Since QEMU 2.0, this includes "active ``block-commit``" |
| (i.e. merge the current active layer into the base image). |
| |
| .. note:: Once the 'commit' operation has finished, there are three |
| things to note here as well: |
| |
| (a) QEMU rewrites the backing chain to remove reference |
| to now-redundant overlay images that have been |
| committed into a backing file; |
| |
| (b) the committed file *itself* won't be removed by QEMU |
| -- it ought to be manually removed; |
| |
| (c) however, unlike in the case of ``block-stream``, the |
| intermediate images will be rendered invalid -- i.e. |
| no more further overlays can be created based on |
| them. Refer the ``block-commit`` section further |
| below for more details. |
| |
| (3) ``drive-mirror`` (and ``blockdev-mirror``): Synchronize a running |
| disk to another image. |
| |
| (4) ``blockdev-backup`` (and the deprecated ``drive-backup``): |
| Point-in-time (live) copy of a block device to a destination. |
| |
| |
| .. _`Interacting with a QEMU instance`: |
| |
| Interacting with a QEMU instance |
| -------------------------------- |
| |
| To show some example invocations of command-line, we will use the |
| following invocation of QEMU, with a QMP server running over UNIX |
| socket: |
| |
| .. parsed-literal:: |
| |
| $ |qemu_system| -display none -no-user-config -nodefaults \\ |
| -m 512 -blockdev \\ |
| node-name=node-A,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./a.qcow2 \\ |
| -device virtio-blk,drive=node-A,id=virtio0 \\ |
| -monitor stdio -qmp unix:/tmp/qmp-sock,server=on,wait=off |
| |
| The ``-blockdev`` command-line option, used above, is available from |
| QEMU 2.9 onwards. In the above invocation, notice the ``node-name`` |
| parameter that is used to refer to the disk image a.qcow2 ('node-A') -- |
| this is a cleaner way to refer to a disk image (as opposed to referring |
| to it by spelling out file paths). So, we will continue to designate a |
| ``node-name`` to each further disk image created (either via |
| ``blockdev-snapshot-sync``, or ``blockdev-add``) as part of the disk |
| image chain, and continue to refer to the disks using their |
| ``node-name`` (where possible, because ``block-commit`` does not yet, as |
| of QEMU 2.9, accept ``node-name`` parameter) when performing various |
| block operations. |
| |
| To interact with the QEMU instance launched above, we will use the |
| ``qmp-shell`` utility (located at: ``qemu/scripts/qmp``, as part of the |
| QEMU source directory), which takes key-value pairs for QMP commands. |
| Invoke it as below (which will also print out the complete raw JSON |
| syntax for reference -- examples in the following sections):: |
| |
| $ ./qmp-shell -v -p /tmp/qmp-sock |
| (QEMU) |
| |
| .. note:: |
| In the event we have to repeat a certain QMP command, we will: for |
| the first occurrence of it, show the ``qmp-shell`` invocation, *and* |
| the corresponding raw JSON QMP syntax; but for subsequent |
| invocations, present just the ``qmp-shell`` syntax, and omit the |
| equivalent JSON output. |
| |
| |
| Example disk image chain |
| ------------------------ |
| |
| We will use the below disk image chain (and occasionally spelling it |
| out where appropriate) when discussing various primitives:: |
| |
| [A] <-- [B] <-- [C] <-- [D] |
| |
| Where [A] is the original base image; [B] and [C] are intermediate |
| overlay images; image [D] is the active layer -- i.e. live QEMU is |
| writing to it. (The rule of thumb is: live QEMU will always be pointing |
| to the rightmost image in a disk image chain.) |
| |
| The above image chain can be created by invoking |
| ``blockdev-snapshot-sync`` commands as following (which shows the |
| creation of overlay image [B]) using the ``qmp-shell`` (our invocation |
| also prints the raw JSON invocation of it):: |
| |
| (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2 |
| { |
| "execute": "blockdev-snapshot-sync", |
| "arguments": { |
| "node-name": "node-A", |
| "snapshot-file": "b.qcow2", |
| "format": "qcow2", |
| "snapshot-node-name": "node-B" |
| } |
| } |
| |
| Here, "node-A" is the name QEMU internally uses to refer to the base |
| image [A] -- it is the backing file, based on which the overlay image, |
| [B], is created. |
| |
| To create the rest of the overlay images, [C], and [D] (omitting the raw |
| JSON output for brevity):: |
| |
| (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 snapshot-node-name=node-C format=qcow2 |
| (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 snapshot-node-name=node-D format=qcow2 |
| |
| |
| A note on points-in-time vs file names |
| -------------------------------------- |
| |
| In our disk image chain:: |
| |
| [A] <-- [B] <-- [C] <-- [D] |
| |
| We have *three* points in time and an active layer: |
| |
| - Point 1: Guest state when [B] was created is contained in file [A] |
| - Point 2: Guest state when [C] was created is contained in [A] + [B] |
| - Point 3: Guest state when [D] was created is contained in |
| [A] + [B] + [C] |
| - Active layer: Current guest state is contained in [A] + [B] + [C] + |
| [D] |
| |
| Therefore, be aware with naming choices: |
| |
| - Naming a file after the time it is created is misleading -- the |
| guest data for that point in time is *not* contained in that file |
| (as explained earlier) |
| - Rather, think of files as a *delta* from the backing file |
| |
| |
| Live block streaming --- ``block-stream`` |
| ----------------------------------------- |
| |
| The ``block-stream`` command allows you to do live copy data from backing |
| files into overlay images. |
| |
| Given our original example disk image chain from earlier:: |
| |
| [A] <-- [B] <-- [C] <-- [D] |
| |
| The disk image chain can be shortened in one of the following different |
| ways (not an exhaustive list). |
| |
| .. _`Case-1`: |
| |
| (1) Merge everything into the active layer: I.e. copy all contents from |
| the base image, [A], and overlay images, [B] and [C], into [D], |
| *while* the guest is running. The resulting chain will be a |
| standalone image, [D] -- with contents from [A], [B] and [C] merged |
| into it (where live QEMU writes go to):: |
| |
| [D] |
| |
| .. _`Case-2`: |
| |
| (2) Taking the same example disk image chain mentioned earlier, merge |
| only images [B] and [C] into [D], the active layer. The result will |
| be contents of images [B] and [C] will be copied into [D], and the |
| backing file pointer of image [D] will be adjusted to point to image |
| [A]. The resulting chain will be:: |
| |
| [A] <-- [D] |
| |
| .. _`Case-3`: |
| |
| (3) Intermediate streaming (available since QEMU 2.8): Starting afresh |
| with the original example disk image chain, with a total of four |
| images, it is possible to copy contents from image [B] into image |
| [C]. Once the copy is finished, image [B] can now be (optionally) |
| discarded; and the backing file pointer of image [C] will be |
| adjusted to point to [A]. I.e. after performing "intermediate |
| streaming" of [B] into [C], the resulting image chain will be (where |
| live QEMU is writing to [D]):: |
| |
| [A] <-- [C] <-- [D] |
| |
| |
| QMP invocation for ``block-stream`` |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| For `Case-1`_, to merge contents of all the backing files into the |
| active layer, where 'node-D' is the current active image (by default |
| ``block-stream`` will flatten the entire chain); ``qmp-shell`` (and its |
| corresponding JSON output):: |
| |
| (QEMU) block-stream device=node-D job-id=job0 |
| { |
| "execute": "block-stream", |
| "arguments": { |
| "device": "node-D", |
| "job-id": "job0" |
| } |
| } |
| |
| For `Case-2`_, merge contents of the images [B] and [C] into [D], where |
| image [D] ends up referring to image [A] as its backing file:: |
| |
| (QEMU) block-stream device=node-D base-node=node-A job-id=job0 |
| |
| And for `Case-3`_, of "intermediate" streaming", merge contents of |
| images [B] into [C], where [C] ends up referring to [A] as its backing |
| image:: |
| |
| (QEMU) block-stream device=node-C base-node=node-A job-id=job0 |
| |
| Progress of a ``block-stream`` operation can be monitored via the QMP |
| command:: |
| |
| (QEMU) query-block-jobs |
| { |
| "execute": "query-block-jobs", |
| "arguments": {} |
| } |
| |
| |
| Once the ``block-stream`` operation has completed, QEMU will emit an |
| event, ``BLOCK_JOB_COMPLETED``. The intermediate overlays remain valid, |
| and can now be (optionally) discarded, or retained to create further |
| overlays based on them. Finally, the ``block-stream`` jobs can be |
| restarted at anytime. |
| |
| |
| Live block commit --- ``block-commit`` |
| -------------------------------------- |
| |
| The ``block-commit`` command lets you merge live data from overlay |
| images into backing file(s). Since QEMU 2.0, this includes "live active |
| commit" (i.e. it is possible to merge the "active layer", the right-most |
| image in a disk image chain where live QEMU will be writing to, into the |
| base image). This is analogous to ``block-stream``, but in the opposite |
| direction. |
| |
| Again, starting afresh with our example disk image chain, where live |
| QEMU is writing to the right-most image in the chain, [D]:: |
| |
| [A] <-- [B] <-- [C] <-- [D] |
| |
| The disk image chain can be shortened in one of the following ways: |
| |
| .. _`block-commit_Case-1`: |
| |
| (1) Commit content from only image [B] into image [A]. The resulting |
| chain is the following, where image [C] is adjusted to point at [A] |
| as its new backing file:: |
| |
| [A] <-- [C] <-- [D] |
| |
| (2) Commit content from images [B] and [C] into image [A]. The |
| resulting chain, where image [D] is adjusted to point to image [A] |
| as its new backing file:: |
| |
| [A] <-- [D] |
| |
| .. _`block-commit_Case-3`: |
| |
| (3) Commit content from images [B], [C], and the active layer [D] into |
| image [A]. The resulting chain (in this case, a consolidated single |
| image):: |
| |
| [A] |
| |
| (4) Commit content from image only image [C] into image [B]. The |
| resulting chain:: |
| |
| [A] <-- [B] <-- [D] |
| |
| (5) Commit content from image [C] and the active layer [D] into image |
| [B]. The resulting chain:: |
| |
| [A] <-- [B] |
| |
| |
| QMP invocation for ``block-commit`` |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| For :ref:`Case-1 <block-commit_Case-1>`, to merge contents only from |
| image [B] into image [A], the invocation is as follows:: |
| |
| (QEMU) block-commit device=node-D base=a.qcow2 top=b.qcow2 job-id=job0 |
| { |
| "execute": "block-commit", |
| "arguments": { |
| "device": "node-D", |
| "job-id": "job0", |
| "top": "b.qcow2", |
| "base": "a.qcow2" |
| } |
| } |
| |
| Once the above ``block-commit`` operation has completed, a |
| ``BLOCK_JOB_COMPLETED`` event will be issued, and no further action is |
| required. As the end result, the backing file of image [C] is adjusted |
| to point to image [A], and the original 4-image chain will end up being |
| transformed to:: |
| |
| [A] <-- [C] <-- [D] |
| |
| .. note:: |
| The intermediate image [B] is invalid (as in: no more further |
| overlays based on it can be created). |
| |
| Reasoning: An intermediate image after a 'stream' operation still |
| represents that old point-in-time, and may be valid in that context. |
| However, an intermediate image after a 'commit' operation no longer |
| represents any point-in-time, and is invalid in any context. |
| |
| |
| However, :ref:`Case-3 <block-commit_Case-3>` (also called: "active |
| ``block-commit``") is a *two-phase* operation: In the first phase, the |
| content from the active overlay, along with the intermediate overlays, |
| is copied into the backing file (also called the base image). In the |
| second phase, adjust the said backing file as the current active image |
| -- possible via issuing the command ``block-job-complete``. Optionally, |
| the ``block-commit`` operation can be cancelled by issuing the command |
| ``block-job-cancel``, but be careful when doing this. |
| |
| Once the ``block-commit`` operation has completed, the event |
| ``BLOCK_JOB_READY`` will be emitted, signalling that the synchronization |
| has finished. Now the job can be gracefully completed by issuing the |
| command ``block-job-complete`` -- until such a command is issued, the |
| 'commit' operation remains active. |
| |
| The following is the flow for :ref:`Case-3 <block-commit_Case-3>` to |
| convert a disk image chain such as this:: |
| |
| [A] <-- [B] <-- [C] <-- [D] |
| |
| Into:: |
| |
| [A] |
| |
| Where content from all the subsequent overlays, [B], and [C], including |
| the active layer, [D], is committed back to [A] -- which is where live |
| QEMU is performing all its current writes). |
| |
| Start the "active ``block-commit``" operation:: |
| |
| (QEMU) block-commit device=node-D base=a.qcow2 top=d.qcow2 job-id=job0 |
| { |
| "execute": "block-commit", |
| "arguments": { |
| "device": "node-D", |
| "job-id": "job0", |
| "top": "d.qcow2", |
| "base": "a.qcow2" |
| } |
| } |
| |
| |
| Once the synchronization has completed, the event ``BLOCK_JOB_READY`` will |
| be emitted. |
| |
| Then, optionally query for the status of the active block operations. |
| We can see the 'commit' job is now ready to be completed, as indicated |
| by the line *"ready": true*:: |
| |
| (QEMU) query-block-jobs |
| { |
| "execute": "query-block-jobs", |
| "arguments": {} |
| } |
| { |
| "return": [ |
| { |
| "busy": false, |
| "type": "commit", |
| "len": 1376256, |
| "paused": false, |
| "ready": true, |
| "io-status": "ok", |
| "offset": 1376256, |
| "device": "job0", |
| "speed": 0 |
| } |
| ] |
| } |
| |
| Gracefully complete the 'commit' block device job:: |
| |
| (QEMU) block-job-complete device=job0 |
| { |
| "execute": "block-job-complete", |
| "arguments": { |
| "device": "job0" |
| } |
| } |
| { |
| "return": {} |
| } |
| |
| Finally, once the above job is completed, an event |
| ``BLOCK_JOB_COMPLETED`` will be emitted. |
| |
| .. note:: |
| The invocation for rest of the cases (2, 4, and 5), discussed in the |
| previous section, is omitted for brevity. |
| |
| |
| Live disk synchronization --- ``drive-mirror`` and ``blockdev-mirror`` |
| ---------------------------------------------------------------------- |
| |
| Synchronize a running disk image chain (all or part of it) to a target |
| image. |
| |
| Again, given our familiar disk image chain:: |
| |
| [A] <-- [B] <-- [C] <-- [D] |
| |
| The ``drive-mirror`` (and its newer equivalent ``blockdev-mirror``) |
| allows you to copy data from the entire chain into a single target image |
| (which can be located on a different host), [E]. |
| |
| .. note:: |
| |
| When you cancel an in-progress 'mirror' job *before* the source and |
| target are synchronized, ``block-job-cancel`` will emit the event |
| ``BLOCK_JOB_CANCELLED``. However, note that if you cancel a |
| 'mirror' job *after* it has indicated (via the event |
| ``BLOCK_JOB_READY``) that the source and target have reached |
| synchronization, then the event emitted by ``block-job-cancel`` |
| changes to ``BLOCK_JOB_COMPLETED``. |
| |
| Besides the 'mirror' job, the "active ``block-commit``" is the only |
| other block device job that emits the event ``BLOCK_JOB_READY``. |
| The rest of the block device jobs ('stream', "non-active |
| ``block-commit``", and 'backup') end automatically. |
| |
| So there are two possible actions to take, after a 'mirror' job has |
| emitted the event ``BLOCK_JOB_READY``, indicating that the source and |
| target have reached synchronization: |
| |
| (1) Issuing the command ``block-job-cancel`` (after it emits the event |
| ``BLOCK_JOB_COMPLETED``) will create a point-in-time (which is at |
| the time of *triggering* the cancel command) copy of the entire disk |
| image chain (or only the top-most image, depending on the ``sync`` |
| mode), contained in the target image [E]. One use case for this is |
| live VM migration with non-shared storage. |
| |
| (2) Issuing the command ``block-job-complete`` (after it emits the event |
| ``BLOCK_JOB_COMPLETED``) will adjust the guest device (i.e. live |
| QEMU) to point to the target image, [E], causing all the new writes |
| from this point on to happen there. |
| |
| About synchronization modes: The synchronization mode determines |
| *which* part of the disk image chain will be copied to the target. |
| Currently, there are four different kinds: |
| |
| (1) ``full`` -- Synchronize the content of entire disk image chain to |
| the target |
| |
| (2) ``top`` -- Synchronize only the contents of the top-most disk image |
| in the chain to the target |
| |
| (3) ``none`` -- Synchronize only the new writes from this point on. |
| |
| .. note:: In the case of ``blockdev-backup`` (or deprecated |
| ``drive-backup``), the behavior of ``none`` |
| synchronization mode is different. Normally, a |
| ``backup`` job consists of two parts: Anything that is |
| overwritten by the guest is first copied out to the |
| backup, and in the background the whole image is copied |
| from start to end. With ``sync=none``, it's only the |
| first part. |
| |
| (4) ``incremental`` -- Synchronize content that is described by the |
| dirty bitmap |
| |
| .. note:: |
| Refer to the :doc:`bitmaps` document in the QEMU source |
| tree to learn about the detailed workings of the ``incremental`` |
| synchronization mode. |
| |
| |
| QMP invocation for ``drive-mirror`` |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| To copy the contents of the entire disk image chain, from [A] all the |
| way to [D], to a new target (``drive-mirror`` will create the destination |
| file, if it doesn't already exist), call it [E]:: |
| |
| (QEMU) drive-mirror device=node-D target=e.qcow2 sync=full job-id=job0 |
| { |
| "execute": "drive-mirror", |
| "arguments": { |
| "device": "node-D", |
| "job-id": "job0", |
| "target": "e.qcow2", |
| "sync": "full" |
| } |
| } |
| |
| The ``"sync": "full"``, from the above, means: copy the *entire* chain |
| to the destination. |
| |
| Following the above, querying for active block jobs will show that a |
| 'mirror' job is "ready" to be completed (and QEMU will also emit an |
| event, ``BLOCK_JOB_READY``):: |
| |
| (QEMU) query-block-jobs |
| { |
| "execute": "query-block-jobs", |
| "arguments": {} |
| } |
| { |
| "return": [ |
| { |
| "busy": false, |
| "type": "mirror", |
| "len": 21757952, |
| "paused": false, |
| "ready": true, |
| "io-status": "ok", |
| "offset": 21757952, |
| "device": "job0", |
| "speed": 0 |
| } |
| ] |
| } |
| |
| And, as noted in the previous section, there are two possible actions |
| at this point: |
| |
| (a) Create a point-in-time snapshot by ending the synchronization. The |
| point-in-time is at the time of *ending* the sync. (The result of |
| the following being: the target image, [E], will be populated with |
| content from the entire chain, [A] to [D]):: |
| |
| (QEMU) block-job-cancel device=job0 |
| { |
| "execute": "block-job-cancel", |
| "arguments": { |
| "device": "job0" |
| } |
| } |
| |
| (b) Or, complete the operation and pivot the live QEMU to the target |
| copy:: |
| |
| (QEMU) block-job-complete device=job0 |
| |
| In either of the above cases, if you once again run the |
| ``query-block-jobs`` command, there should not be any active block |
| operation. |
| |
| Comparing 'commit' and 'mirror': In both then cases, the overlay images |
| can be discarded. However, with 'commit', the *existing* base image |
| will be modified (by updating it with contents from overlays); while in |
| the case of 'mirror', a *new* target image is populated with the data |
| from the disk image chain. |
| |
| |
| QMP invocation for live storage migration with ``drive-mirror`` + NBD |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Live storage migration (without shared storage setup) is one of the most |
| common use-cases that takes advantage of the ``drive-mirror`` primitive |
| and QEMU's built-in Network Block Device (NBD) server. Here's a quick |
| walk-through of this setup. |
| |
| Given the disk image chain:: |
| |
| [A] <-- [B] <-- [C] <-- [D] |
| |
| Instead of copying content from the entire chain, synchronize *only* the |
| contents of the *top*-most disk image (i.e. the active layer), [D], to a |
| target, say, [TargetDisk]. |
| |
| .. important:: |
| The destination host must already have the contents of the backing |
| chain, involving images [A], [B], and [C], visible via other means |
| -- whether by ``cp``, ``rsync``, or by some storage array-specific |
| command.) |
| |
| Sometimes, this is also referred to as "shallow copy" -- because only |
| the "active layer", and not the rest of the image chain, is copied to |
| the destination. |
| |
| .. note:: |
| In this example, for the sake of simplicity, we'll be using the same |
| ``localhost`` as both source and destination. |
| |
| As noted earlier, on the destination host the contents of the backing |
| chain -- from images [A] to [C] -- are already expected to exist in some |
| form (e.g. in a file called, ``Contents-of-A-B-C.qcow2``). Now, on the |
| destination host, let's create a target overlay image (with the image |
| ``Contents-of-A-B-C.qcow2`` as its backing file), to which the contents |
| of image [D] (from the source QEMU) will be mirrored to:: |
| |
| $ qemu-img create -f qcow2 -b ./Contents-of-A-B-C.qcow2 \ |
| -F qcow2 ./target-disk.qcow2 |
| |
| And start the destination QEMU (we already have the source QEMU running |
| -- discussed in the section: `Interacting with a QEMU instance`_) |
| instance, with the following invocation. (As noted earlier, for |
| simplicity's sake, the destination QEMU is started on the same host, but |
| it could be located elsewhere): |
| |
| .. parsed-literal:: |
| |
| $ |qemu_system| -display none -no-user-config -nodefaults \\ |
| -m 512 -blockdev \\ |
| node-name=node-TargetDisk,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./target-disk.qcow2 \\ |
| -device virtio-blk,drive=node-TargetDisk,id=virtio0 \\ |
| -S -monitor stdio -qmp unix:./qmp-sock2,server=on,wait=off \\ |
| -incoming tcp:localhost:6666 |
| |
| Given the disk image chain on source QEMU:: |
| |
| [A] <-- [B] <-- [C] <-- [D] |
| |
| On the destination host, it is expected that the contents of the chain |
| ``[A] <-- [B] <-- [C]`` are *already* present, and therefore copy *only* |
| the content of image [D]. |
| |
| (1) [On *destination* QEMU] As part of the first step, start the |
| built-in NBD server on a given host (local host, represented by |
| ``::``)and port:: |
| |
| (QEMU) nbd-server-start addr={"type":"inet","data":{"host":"::","port":"49153"}} |
| { |
| "execute": "nbd-server-start", |
| "arguments": { |
| "addr": { |
| "data": { |
| "host": "::", |
| "port": "49153" |
| }, |
| "type": "inet" |
| } |
| } |
| } |
| |
| (2) [On *destination* QEMU] And export the destination disk image using |
| QEMU's built-in NBD server:: |
| |
| (QEMU) nbd-server-add device=node-TargetDisk writable=true |
| { |
| "execute": "nbd-server-add", |
| "arguments": { |
| "device": "node-TargetDisk" |
| } |
| } |
| |
| (3) [On *source* QEMU] Then, invoke ``drive-mirror`` (NB: since we're |
| running ``drive-mirror`` with ``mode=existing`` (meaning: |
| synchronize to a pre-created file, therefore 'existing', file on the |
| target host), with the synchronization mode as 'top' (``"sync: |
| "top"``):: |
| |
| (QEMU) drive-mirror device=node-D target=nbd:localhost:49153:exportname=node-TargetDisk sync=top mode=existing job-id=job0 |
| { |
| "execute": "drive-mirror", |
| "arguments": { |
| "device": "node-D", |
| "mode": "existing", |
| "job-id": "job0", |
| "target": "nbd:localhost:49153:exportname=node-TargetDisk", |
| "sync": "top" |
| } |
| } |
| |
| (4) [On *source* QEMU] Once ``drive-mirror`` copies the entire data, and the |
| event ``BLOCK_JOB_READY`` is emitted, issue ``block-job-cancel`` to |
| gracefully end the synchronization, from source QEMU:: |
| |
| (QEMU) block-job-cancel device=job0 |
| { |
| "execute": "block-job-cancel", |
| "arguments": { |
| "device": "job0" |
| } |
| } |
| |
| (5) [On *destination* QEMU] Then, stop the NBD server:: |
| |
| (QEMU) nbd-server-stop |
| { |
| "execute": "nbd-server-stop", |
| "arguments": {} |
| } |
| |
| (6) [On *destination* QEMU] Finally, resume the guest vCPUs by issuing the |
| QMP command ``cont``:: |
| |
| (QEMU) cont |
| { |
| "execute": "cont", |
| "arguments": {} |
| } |
| |
| .. note:: |
| Higher-level libraries (e.g. libvirt) automate the entire above |
| process (although note that libvirt does not allow same-host |
| migrations to localhost for other reasons). |
| |
| |
| Notes on ``blockdev-mirror`` |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| The ``blockdev-mirror`` command is equivalent in core functionality to |
| ``drive-mirror``, except that it operates at node-level in a BDS graph. |
| |
| Also: for ``blockdev-mirror``, the 'target' image needs to be explicitly |
| created (using ``qemu-img``) and attach it to live QEMU via |
| ``blockdev-add``, which assigns a name to the to-be created target node. |
| |
| E.g. the sequence of actions to create a point-in-time backup of an |
| entire disk image chain, to a target, using ``blockdev-mirror`` would be: |
| |
| (0) Create the QCOW2 overlays, to arrive at a backing chain of desired |
| depth |
| |
| (1) Create the target image (using ``qemu-img``), say, ``e.qcow2`` |
| |
| (2) Attach the above created file (``e.qcow2``), run-time, using |
| ``blockdev-add`` to QEMU |
| |
| (3) Perform ``blockdev-mirror`` (use ``"sync": "full"`` to copy the |
| entire chain to the target). And notice the event |
| ``BLOCK_JOB_READY`` |
| |
| (4) Optionally, query for active block jobs, there should be a 'mirror' |
| job ready to be completed |
| |
| (5) Gracefully complete the 'mirror' block device job, and notice the |
| event ``BLOCK_JOB_COMPLETED`` |
| |
| (6) Shutdown the guest by issuing the QMP ``quit`` command so that |
| caches are flushed |
| |
| (7) Then, finally, compare the contents of the disk image chain, and |
| the target copy with ``qemu-img compare``. You should notice: |
| "Images are identical" |
| |
| |
| QMP invocation for ``blockdev-mirror`` |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Given the disk image chain:: |
| |
| [A] <-- [B] <-- [C] <-- [D] |
| |
| To copy the contents of the entire disk image chain, from [A] all the |
| way to [D], to a new target, call it [E]. The following is the flow. |
| |
| Create the overlay images, [B], [C], and [D]:: |
| |
| (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2 |
| (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 snapshot-node-name=node-C format=qcow2 |
| (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 snapshot-node-name=node-D format=qcow2 |
| |
| Create the target image, [E]:: |
| |
| $ qemu-img create -f qcow2 e.qcow2 39M |
| |
| Add the above created target image to QEMU, via ``blockdev-add``:: |
| |
| (QEMU) blockdev-add driver=qcow2 node-name=node-E file={"driver":"file","filename":"e.qcow2"} |
| { |
| "execute": "blockdev-add", |
| "arguments": { |
| "node-name": "node-E", |
| "driver": "qcow2", |
| "file": { |
| "driver": "file", |
| "filename": "e.qcow2" |
| } |
| } |
| } |
| |
| Perform ``blockdev-mirror``, and notice the event ``BLOCK_JOB_READY``:: |
| |
| (QEMU) blockdev-mirror device=node-B target=node-E sync=full job-id=job0 |
| { |
| "execute": "blockdev-mirror", |
| "arguments": { |
| "device": "node-D", |
| "job-id": "job0", |
| "target": "node-E", |
| "sync": "full" |
| } |
| } |
| |
| Query for active block jobs, there should be a 'mirror' job ready:: |
| |
| (QEMU) query-block-jobs |
| { |
| "execute": "query-block-jobs", |
| "arguments": {} |
| } |
| { |
| "return": [ |
| { |
| "busy": false, |
| "type": "mirror", |
| "len": 21561344, |
| "paused": false, |
| "ready": true, |
| "io-status": "ok", |
| "offset": 21561344, |
| "device": "job0", |
| "speed": 0 |
| } |
| ] |
| } |
| |
| Gracefully complete the block device job operation, and notice the |
| event ``BLOCK_JOB_COMPLETED``:: |
| |
| (QEMU) block-job-complete device=job0 |
| { |
| "execute": "block-job-complete", |
| "arguments": { |
| "device": "job0" |
| } |
| } |
| { |
| "return": {} |
| } |
| |
| Shutdown the guest, by issuing the ``quit`` QMP command:: |
| |
| (QEMU) quit |
| { |
| "execute": "quit", |
| "arguments": {} |
| } |
| |
| |
| Live disk backup --- ``blockdev-backup`` and the deprecated``drive-backup`` |
| --------------------------------------------------------------------------- |
| |
| The ``blockdev-backup`` (and the deprecated ``drive-backup``) allows |
| you to create a point-in-time snapshot. |
| |
| In this case, the point-in-time is when you *start* the |
| ``blockdev-backup`` (or deprecated ``drive-backup``) command. |
| |
| |
| QMP invocation for ``drive-backup`` |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Note that ``drive-backup`` command is deprecated since QEMU 6.2 and |
| will be removed in future. |
| |
| Yet again, starting afresh with our example disk image chain:: |
| |
| [A] <-- [B] <-- [C] <-- [D] |
| |
| To create a target image [E], with content populated from image [A] to |
| [D], from the above chain, the following is the syntax. (If the target |
| image does not exist, ``drive-backup`` will create it):: |
| |
| (QEMU) drive-backup device=node-D sync=full target=e.qcow2 job-id=job0 |
| { |
| "execute": "drive-backup", |
| "arguments": { |
| "device": "node-D", |
| "job-id": "job0", |
| "sync": "full", |
| "target": "e.qcow2" |
| } |
| } |
| |
| Once the above ``drive-backup`` has completed, a ``BLOCK_JOB_COMPLETED`` event |
| will be issued, indicating the live block device job operation has |
| completed, and no further action is required. |
| |
| |
| Moving from the deprecated ``drive-backup`` to newer ``blockdev-backup`` |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| ``blockdev-backup`` differs from ``drive-backup`` in how you specify |
| the backup target. With ``blockdev-backup`` you can't specify filename |
| as a target. Instead you use ``node-name`` of existing block node, |
| which you may add by ``blockdev-add`` or ``blockdev-create`` commands. |
| Correspondingly, ``blockdev-backup`` doesn't have ``mode`` and |
| ``format`` arguments which don't apply to an existing block node. See |
| following sections for details and examples. |
| |
| |
| Notes on ``blockdev-backup`` |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| The ``blockdev-backup`` command operates at node-level in a Block Driver |
| State (BDS) graph. |
| |
| E.g. the sequence of actions to create a point-in-time backup |
| of an entire disk image chain, to a target, using ``blockdev-backup`` |
| would be: |
| |
| (0) Create the QCOW2 overlays, to arrive at a backing chain of desired |
| depth |
| |
| (1) Create the target image (using ``qemu-img``), say, ``e.qcow2`` |
| |
| (2) Attach the above created file (``e.qcow2``), run-time, using |
| ``blockdev-add`` to QEMU |
| |
| (3) Perform ``blockdev-backup`` (use ``"sync": "full"`` to copy the |
| entire chain to the target). And notice the event |
| ``BLOCK_JOB_COMPLETED`` |
| |
| (4) Shutdown the guest, by issuing the QMP ``quit`` command, so that |
| caches are flushed |
| |
| (5) Then, finally, compare the contents of the disk image chain, and |
| the target copy with ``qemu-img compare``. You should notice: |
| "Images are identical" |
| |
| The following section shows an example QMP invocation for |
| ``blockdev-backup``. |
| |
| QMP invocation for ``blockdev-backup`` |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| |
| Given a disk image chain of depth 1 where image [B] is the active |
| overlay (live QEMU is writing to it):: |
| |
| [A] <-- [B] |
| |
| The following is the procedure to copy the content from the entire chain |
| to a target image (say, [E]), which has the full content from [A] and |
| [B]. |
| |
| Create the overlay [B]:: |
| |
| (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2 |
| { |
| "execute": "blockdev-snapshot-sync", |
| "arguments": { |
| "node-name": "node-A", |
| "snapshot-file": "b.qcow2", |
| "format": "qcow2", |
| "snapshot-node-name": "node-B" |
| } |
| } |
| |
| |
| Create a target image that will contain the copy:: |
| |
| $ qemu-img create -f qcow2 e.qcow2 39M |
| |
| Then add it to QEMU via ``blockdev-add``:: |
| |
| (QEMU) blockdev-add driver=qcow2 node-name=node-E file={"driver":"file","filename":"e.qcow2"} |
| { |
| "execute": "blockdev-add", |
| "arguments": { |
| "node-name": "node-E", |
| "driver": "qcow2", |
| "file": { |
| "driver": "file", |
| "filename": "e.qcow2" |
| } |
| } |
| } |
| |
| Then invoke ``blockdev-backup`` to copy the contents from the entire |
| image chain, consisting of images [A] and [B] to the target image |
| 'e.qcow2':: |
| |
| (QEMU) blockdev-backup device=node-B target=node-E sync=full job-id=job0 |
| { |
| "execute": "blockdev-backup", |
| "arguments": { |
| "device": "node-B", |
| "job-id": "job0", |
| "target": "node-E", |
| "sync": "full" |
| } |
| } |
| |
| Once the above 'backup' operation has completed, the event, |
| ``BLOCK_JOB_COMPLETED`` will be emitted, signalling successful |
| completion. |
| |
| Next, query for any active block device jobs (there should be none):: |
| |
| (QEMU) query-block-jobs |
| { |
| "execute": "query-block-jobs", |
| "arguments": {} |
| } |
| |
| Shutdown the guest:: |
| |
| (QEMU) quit |
| { |
| "execute": "quit", |
| "arguments": {} |
| } |
| "return": {} |
| } |
| |
| .. note:: |
| The above step is really important; if forgotten, an error, "Failed |
| to get shared "write" lock on e.qcow2", will be thrown when you do |
| ``qemu-img compare`` to verify the integrity of the disk image |
| with the backup content. |
| |
| |
| The end result will be the image 'e.qcow2' containing a |
| point-in-time backup of the disk image chain -- i.e. contents from |
| images [A] and [B] at the time the ``blockdev-backup`` command was |
| initiated. |
| |
| One way to confirm the backup disk image contains the identical content |
| with the disk image chain is to compare the backup and the contents of |
| the chain, you should see "Images are identical". (NB: this is assuming |
| QEMU was launched with ``-S`` option, which will not start the CPUs at |
| guest boot up):: |
| |
| $ qemu-img compare b.qcow2 e.qcow2 |
| Warning: Image size mismatch! |
| Images are identical. |
| |
| NOTE: The "Warning: Image size mismatch!" is expected, as we created the |
| target image (e.qcow2) with 39M size. |