|  | .. | 
|  | Copyright (C) 2017 Red Hat Inc. | 
|  |  | 
|  | This work is licensed under the terms of the GNU GPL, version 2 or | 
|  | later.  See the COPYING file in the top-level directory. | 
|  |  | 
|  | .. _Live Block Operations: | 
|  |  | 
|  | ============================ | 
|  | Live Block Device Operations | 
|  | ============================ | 
|  |  | 
|  | QEMU Block Layer currently (as of QEMU 2.9) supports four major kinds of | 
|  | live block device jobs -- stream, commit, mirror, and backup.  These can | 
|  | be used to manipulate disk image chains to accomplish certain tasks, | 
|  | namely: live copy data from backing files into overlays; shorten long | 
|  | disk image chains by merging data from overlays into backing files; live | 
|  | synchronize data from a disk image chain (including current active disk) | 
|  | to another target image; and point-in-time (and incremental) backups of | 
|  | a block device.  Below is a description of the said block (QMP) | 
|  | primitives, and some (non-exhaustive list of) examples to illustrate | 
|  | their use. | 
|  |  | 
|  | .. note:: | 
|  | The file ``qapi/block-core.json`` in the QEMU source tree has the | 
|  | canonical QEMU API (QAPI) schema documentation for the QMP | 
|  | primitives discussed here. | 
|  |  | 
|  | .. todo (kashyapc):: Remove the ".. contents::" directive when Sphinx is | 
|  | integrated. | 
|  |  | 
|  | .. contents:: | 
|  |  | 
|  | Disk image backing chain notation | 
|  | --------------------------------- | 
|  |  | 
|  | A simple disk image chain.  (This can be created live using QMP | 
|  | ``blockdev-snapshot-sync``, or offline via ``qemu-img``):: | 
|  |  | 
|  | (Live QEMU) | 
|  | | | 
|  | . | 
|  | V | 
|  |  | 
|  | [A] <----- [B] | 
|  |  | 
|  | (backing file)    (overlay) | 
|  |  | 
|  | The arrow can be read as: Image [A] is the backing file of disk image | 
|  | [B].  And live QEMU is currently writing to image [B], consequently, it | 
|  | is also referred to as the "active layer". | 
|  |  | 
|  | There are two kinds of terminology that are common when referring to | 
|  | files in a disk image backing chain: | 
|  |  | 
|  | (1) Directional: 'base' and 'top'.  Given the simple disk image chain | 
|  | above, image [A] can be referred to as 'base', and image [B] as | 
|  | 'top'.  (This terminology can be seen in the QAPI schema file, | 
|  | block-core.json.) | 
|  |  | 
|  | (2) Relational: 'backing file' and 'overlay'.  Again, taking the same | 
|  | simple disk image chain from the above, disk image [A] is referred | 
|  | to as the backing file, and image [B] as overlay. | 
|  |  | 
|  | Throughout this document, we will use the relational terminology. | 
|  |  | 
|  | .. important:: | 
|  | The overlay files can generally be any format that supports a | 
|  | backing file, although QCOW2 is the preferred format and the one | 
|  | used in this document. | 
|  |  | 
|  |  | 
|  | Brief overview of live block QMP primitives | 
|  | ------------------------------------------- | 
|  |  | 
|  | The following are the four different kinds of live block operations that | 
|  | QEMU block layer supports. | 
|  |  | 
|  | (1) ``block-stream``: Live copy of data from backing files into overlay | 
|  | files. | 
|  |  | 
|  | .. note:: Once the 'stream' operation has finished, three things to | 
|  | note: | 
|  |  | 
|  | (a) QEMU rewrites the backing chain to remove | 
|  | reference to the now-streamed and redundant backing | 
|  | file; | 
|  |  | 
|  | (b) the streamed file *itself* won't be removed by QEMU, | 
|  | and must be explicitly discarded by the user; | 
|  |  | 
|  | (c) the streamed file remains valid -- i.e. further | 
|  | overlays can be created based on it.  Refer the | 
|  | ``block-stream`` section further below for more | 
|  | details. | 
|  |  | 
|  | (2) ``block-commit``: Live merge of data from overlay files into backing | 
|  | files (with the optional goal of removing the overlay file from the | 
|  | chain).  Since QEMU 2.0, this includes "active ``block-commit``" | 
|  | (i.e. merge the current active layer into the base image). | 
|  |  | 
|  | .. note:: Once the 'commit' operation has finished, there are three | 
|  | things to note here as well: | 
|  |  | 
|  | (a) QEMU rewrites the backing chain to remove reference | 
|  | to now-redundant overlay images that have been | 
|  | committed into a backing file; | 
|  |  | 
|  | (b) the committed file *itself* won't be removed by QEMU | 
|  | -- it ought to be manually removed; | 
|  |  | 
|  | (c) however, unlike in the case of ``block-stream``, the | 
|  | intermediate images will be rendered invalid -- i.e. | 
|  | no more further overlays can be created based on | 
|  | them.  Refer the ``block-commit`` section further | 
|  | below for more details. | 
|  |  | 
|  | (3) ``drive-mirror`` (and ``blockdev-mirror``): Synchronize a running | 
|  | disk to another image. | 
|  |  | 
|  | (4) ``blockdev-backup`` (and the deprecated ``drive-backup``): | 
|  | Point-in-time (live) copy of a block device to a destination. | 
|  |  | 
|  |  | 
|  | .. _`Interacting with a QEMU instance`: | 
|  |  | 
|  | Interacting with a QEMU instance | 
|  | -------------------------------- | 
|  |  | 
|  | To show some example invocations of command-line, we will use the | 
|  | following invocation of QEMU, with a QMP server running over UNIX | 
|  | socket: | 
|  |  | 
|  | .. parsed-literal:: | 
|  |  | 
|  | $ |qemu_system| -display none -no-user-config -nodefaults \\ | 
|  | -m 512 -blockdev \\ | 
|  | node-name=node-A,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./a.qcow2 \\ | 
|  | -device virtio-blk,drive=node-A,id=virtio0 \\ | 
|  | -monitor stdio -qmp unix:/tmp/qmp-sock,server=on,wait=off | 
|  |  | 
|  | The ``-blockdev`` command-line option, used above, is available from | 
|  | QEMU 2.9 onwards.  In the above invocation, notice the ``node-name`` | 
|  | parameter that is used to refer to the disk image a.qcow2 ('node-A') -- | 
|  | this is a cleaner way to refer to a disk image (as opposed to referring | 
|  | to it by spelling out file paths).  So, we will continue to designate a | 
|  | ``node-name`` to each further disk image created (either via | 
|  | ``blockdev-snapshot-sync``, or ``blockdev-add``) as part of the disk | 
|  | image chain, and continue to refer to the disks using their | 
|  | ``node-name`` (where possible, because ``block-commit`` does not yet, as | 
|  | of QEMU 2.9, accept ``node-name`` parameter) when performing various | 
|  | block operations. | 
|  |  | 
|  | To interact with the QEMU instance launched above, we will use the | 
|  | ``qmp-shell`` utility (located at: ``qemu/scripts/qmp``, as part of the | 
|  | QEMU source directory), which takes key-value pairs for QMP commands. | 
|  | Invoke it as below (which will also print out the complete raw JSON | 
|  | syntax for reference -- examples in the following sections):: | 
|  |  | 
|  | $ ./qmp-shell -v -p /tmp/qmp-sock | 
|  | (QEMU) | 
|  |  | 
|  | .. note:: | 
|  | In the event we have to repeat a certain QMP command, we will: for | 
|  | the first occurrence of it, show the ``qmp-shell`` invocation, *and* | 
|  | the corresponding raw JSON QMP syntax; but for subsequent | 
|  | invocations, present just the ``qmp-shell`` syntax, and omit the | 
|  | equivalent JSON output. | 
|  |  | 
|  |  | 
|  | Example disk image chain | 
|  | ------------------------ | 
|  |  | 
|  | We will use the below disk image chain (and occasionally spelling it | 
|  | out where appropriate) when discussing various primitives:: | 
|  |  | 
|  | [A] <-- [B] <-- [C] <-- [D] | 
|  |  | 
|  | Where [A] is the original base image; [B] and [C] are intermediate | 
|  | overlay images; image [D] is the active layer -- i.e. live QEMU is | 
|  | writing to it.  (The rule of thumb is: live QEMU will always be pointing | 
|  | to the rightmost image in a disk image chain.) | 
|  |  | 
|  | The above image chain can be created by invoking | 
|  | ``blockdev-snapshot-sync`` commands as following (which shows the | 
|  | creation of overlay image [B]) using the ``qmp-shell`` (our invocation | 
|  | also prints the raw JSON invocation of it):: | 
|  |  | 
|  | (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2 | 
|  | { | 
|  | "execute": "blockdev-snapshot-sync", | 
|  | "arguments": { | 
|  | "node-name": "node-A", | 
|  | "snapshot-file": "b.qcow2", | 
|  | "format": "qcow2", | 
|  | "snapshot-node-name": "node-B" | 
|  | } | 
|  | } | 
|  |  | 
|  | Here, "node-A" is the name QEMU internally uses to refer to the base | 
|  | image [A] -- it is the backing file, based on which the overlay image, | 
|  | [B], is created. | 
|  |  | 
|  | To create the rest of the overlay images, [C], and [D] (omitting the raw | 
|  | JSON output for brevity):: | 
|  |  | 
|  | (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 snapshot-node-name=node-C format=qcow2 | 
|  | (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 snapshot-node-name=node-D format=qcow2 | 
|  |  | 
|  |  | 
|  | A note on points-in-time vs file names | 
|  | -------------------------------------- | 
|  |  | 
|  | In our disk image chain:: | 
|  |  | 
|  | [A] <-- [B] <-- [C] <-- [D] | 
|  |  | 
|  | We have *three* points in time and an active layer: | 
|  |  | 
|  | - Point 1: Guest state when [B] was created is contained in file [A] | 
|  | - Point 2: Guest state when [C] was created is contained in [A] + [B] | 
|  | - Point 3: Guest state when [D] was created is contained in | 
|  | [A] + [B] + [C] | 
|  | - Active layer: Current guest state is contained in [A] + [B] + [C] + | 
|  | [D] | 
|  |  | 
|  | Therefore, be aware with naming choices: | 
|  |  | 
|  | - Naming a file after the time it is created is misleading -- the | 
|  | guest data for that point in time is *not* contained in that file | 
|  | (as explained earlier) | 
|  | - Rather, think of files as a *delta* from the backing file | 
|  |  | 
|  |  | 
|  | Live block streaming --- ``block-stream`` | 
|  | ----------------------------------------- | 
|  |  | 
|  | The ``block-stream`` command allows you to do live copy data from backing | 
|  | files into overlay images. | 
|  |  | 
|  | Given our original example disk image chain from earlier:: | 
|  |  | 
|  | [A] <-- [B] <-- [C] <-- [D] | 
|  |  | 
|  | The disk image chain can be shortened in one of the following different | 
|  | ways (not an exhaustive list). | 
|  |  | 
|  | .. _`Case-1`: | 
|  |  | 
|  | (1) Merge everything into the active layer: I.e. copy all contents from | 
|  | the base image, [A], and overlay images, [B] and [C], into [D], | 
|  | *while* the guest is running.  The resulting chain will be a | 
|  | standalone image, [D] -- with contents from [A], [B] and [C] merged | 
|  | into it (where live QEMU writes go to):: | 
|  |  | 
|  | [D] | 
|  |  | 
|  | .. _`Case-2`: | 
|  |  | 
|  | (2) Taking the same example disk image chain mentioned earlier, merge | 
|  | only images [B] and [C] into [D], the active layer.  The result will | 
|  | be contents of images [B] and [C] will be copied into [D], and the | 
|  | backing file pointer of image [D] will be adjusted to point to image | 
|  | [A].  The resulting chain will be:: | 
|  |  | 
|  | [A] <-- [D] | 
|  |  | 
|  | .. _`Case-3`: | 
|  |  | 
|  | (3) Intermediate streaming (available since QEMU 2.8): Starting afresh | 
|  | with the original example disk image chain, with a total of four | 
|  | images, it is possible to copy contents from image [B] into image | 
|  | [C].  Once the copy is finished, image [B] can now be (optionally) | 
|  | discarded; and the backing file pointer of image [C] will be | 
|  | adjusted to point to [A].  I.e. after performing "intermediate | 
|  | streaming" of [B] into [C], the resulting image chain will be (where | 
|  | live QEMU is writing to [D]):: | 
|  |  | 
|  | [A] <-- [C] <-- [D] | 
|  |  | 
|  |  | 
|  | QMP invocation for ``block-stream`` | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | For `Case-1`_, to merge contents of all the backing files into the | 
|  | active layer, where 'node-D' is the current active image (by default | 
|  | ``block-stream`` will flatten the entire chain); ``qmp-shell`` (and its | 
|  | corresponding JSON output):: | 
|  |  | 
|  | (QEMU) block-stream device=node-D job-id=job0 | 
|  | { | 
|  | "execute": "block-stream", | 
|  | "arguments": { | 
|  | "device": "node-D", | 
|  | "job-id": "job0" | 
|  | } | 
|  | } | 
|  |  | 
|  | For `Case-2`_, merge contents of the images [B] and [C] into [D], where | 
|  | image [D] ends up referring to image [A] as its backing file:: | 
|  |  | 
|  | (QEMU) block-stream device=node-D base-node=node-A job-id=job0 | 
|  |  | 
|  | And for `Case-3`_, of "intermediate" streaming", merge contents of | 
|  | images [B] into [C], where [C] ends up referring to [A] as its backing | 
|  | image:: | 
|  |  | 
|  | (QEMU) block-stream device=node-C base-node=node-A job-id=job0 | 
|  |  | 
|  | Progress of a ``block-stream`` operation can be monitored via the QMP | 
|  | command:: | 
|  |  | 
|  | (QEMU) query-block-jobs | 
|  | { | 
|  | "execute": "query-block-jobs", | 
|  | "arguments": {} | 
|  | } | 
|  |  | 
|  |  | 
|  | Once the ``block-stream`` operation has completed, QEMU will emit an | 
|  | event, ``BLOCK_JOB_COMPLETED``.  The intermediate overlays remain valid, | 
|  | and can now be (optionally) discarded, or retained to create further | 
|  | overlays based on them.  Finally, the ``block-stream`` jobs can be | 
|  | restarted at anytime. | 
|  |  | 
|  |  | 
|  | Live block commit --- ``block-commit`` | 
|  | -------------------------------------- | 
|  |  | 
|  | The ``block-commit`` command lets you merge live data from overlay | 
|  | images into backing file(s).  Since QEMU 2.0, this includes "live active | 
|  | commit" (i.e. it is possible to merge the "active layer", the right-most | 
|  | image in a disk image chain where live QEMU will be writing to, into the | 
|  | base image).  This is analogous to ``block-stream``, but in the opposite | 
|  | direction. | 
|  |  | 
|  | Again, starting afresh with our example disk image chain, where live | 
|  | QEMU is writing to the right-most image in the chain, [D]:: | 
|  |  | 
|  | [A] <-- [B] <-- [C] <-- [D] | 
|  |  | 
|  | The disk image chain can be shortened in one of the following ways: | 
|  |  | 
|  | .. _`block-commit_Case-1`: | 
|  |  | 
|  | (1) Commit content from only image [B] into image [A].  The resulting | 
|  | chain is the following, where image [C] is adjusted to point at [A] | 
|  | as its new backing file:: | 
|  |  | 
|  | [A] <-- [C] <-- [D] | 
|  |  | 
|  | (2) Commit content from images [B] and [C] into image [A].  The | 
|  | resulting chain, where image [D] is adjusted to point to image [A] | 
|  | as its new backing file:: | 
|  |  | 
|  | [A] <-- [D] | 
|  |  | 
|  | .. _`block-commit_Case-3`: | 
|  |  | 
|  | (3) Commit content from images [B], [C], and the active layer [D] into | 
|  | image [A].  The resulting chain (in this case, a consolidated single | 
|  | image):: | 
|  |  | 
|  | [A] | 
|  |  | 
|  | (4) Commit content from image only image [C] into image [B].  The | 
|  | resulting chain:: | 
|  |  | 
|  | [A] <-- [B] <-- [D] | 
|  |  | 
|  | (5) Commit content from image [C] and the active layer [D] into image | 
|  | [B].  The resulting chain:: | 
|  |  | 
|  | [A] <-- [B] | 
|  |  | 
|  |  | 
|  | QMP invocation for ``block-commit`` | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | For :ref:`Case-1 <block-commit_Case-1>`, to merge contents only from | 
|  | image [B] into image [A], the invocation is as follows:: | 
|  |  | 
|  | (QEMU) block-commit device=node-D base=a.qcow2 top=b.qcow2 job-id=job0 | 
|  | { | 
|  | "execute": "block-commit", | 
|  | "arguments": { | 
|  | "device": "node-D", | 
|  | "job-id": "job0", | 
|  | "top": "b.qcow2", | 
|  | "base": "a.qcow2" | 
|  | } | 
|  | } | 
|  |  | 
|  | Once the above ``block-commit`` operation has completed, a | 
|  | ``BLOCK_JOB_COMPLETED`` event will be issued, and no further action is | 
|  | required.  As the end result, the backing file of image [C] is adjusted | 
|  | to point to image [A], and the original 4-image chain will end up being | 
|  | transformed to:: | 
|  |  | 
|  | [A] <-- [C] <-- [D] | 
|  |  | 
|  | .. note:: | 
|  | The intermediate image [B] is invalid (as in: no more further | 
|  | overlays based on it can be created). | 
|  |  | 
|  | Reasoning: An intermediate image after a 'stream' operation still | 
|  | represents that old point-in-time, and may be valid in that context. | 
|  | However, an intermediate image after a 'commit' operation no longer | 
|  | represents any point-in-time, and is invalid in any context. | 
|  |  | 
|  |  | 
|  | However, :ref:`Case-3 <block-commit_Case-3>` (also called: "active | 
|  | ``block-commit``") is a *two-phase* operation: In the first phase, the | 
|  | content from the active overlay, along with the intermediate overlays, | 
|  | is copied into the backing file (also called the base image).  In the | 
|  | second phase, adjust the said backing file as the current active image | 
|  | -- possible via issuing the command ``block-job-complete``.  Optionally, | 
|  | the ``block-commit`` operation can be cancelled by issuing the command | 
|  | ``block-job-cancel``, but be careful when doing this. | 
|  |  | 
|  | Once the ``block-commit`` operation has completed, the event | 
|  | ``BLOCK_JOB_READY`` will be emitted, signalling that the synchronization | 
|  | has finished.  Now the job can be gracefully completed by issuing the | 
|  | command ``block-job-complete`` -- until such a command is issued, the | 
|  | 'commit' operation remains active. | 
|  |  | 
|  | The following is the flow for :ref:`Case-3 <block-commit_Case-3>` to | 
|  | convert a disk image chain such as this:: | 
|  |  | 
|  | [A] <-- [B] <-- [C] <-- [D] | 
|  |  | 
|  | Into:: | 
|  |  | 
|  | [A] | 
|  |  | 
|  | Where content from all the subsequent overlays, [B], and [C], including | 
|  | the active layer, [D], is committed back to [A] -- which is where live | 
|  | QEMU is performing all its current writes). | 
|  |  | 
|  | Start the "active ``block-commit``" operation:: | 
|  |  | 
|  | (QEMU) block-commit device=node-D base=a.qcow2 top=d.qcow2 job-id=job0 | 
|  | { | 
|  | "execute": "block-commit", | 
|  | "arguments": { | 
|  | "device": "node-D", | 
|  | "job-id": "job0", | 
|  | "top": "d.qcow2", | 
|  | "base": "a.qcow2" | 
|  | } | 
|  | } | 
|  |  | 
|  |  | 
|  | Once the synchronization has completed, the event ``BLOCK_JOB_READY`` will | 
|  | be emitted. | 
|  |  | 
|  | Then, optionally query for the status of the active block operations. | 
|  | We can see the 'commit' job is now ready to be completed, as indicated | 
|  | by the line *"ready": true*:: | 
|  |  | 
|  | (QEMU) query-block-jobs | 
|  | { | 
|  | "execute": "query-block-jobs", | 
|  | "arguments": {} | 
|  | } | 
|  | { | 
|  | "return": [ | 
|  | { | 
|  | "busy": false, | 
|  | "type": "commit", | 
|  | "len": 1376256, | 
|  | "paused": false, | 
|  | "ready": true, | 
|  | "io-status": "ok", | 
|  | "offset": 1376256, | 
|  | "device": "job0", | 
|  | "speed": 0 | 
|  | } | 
|  | ] | 
|  | } | 
|  |  | 
|  | Gracefully complete the 'commit' block device job:: | 
|  |  | 
|  | (QEMU) block-job-complete device=job0 | 
|  | { | 
|  | "execute": "block-job-complete", | 
|  | "arguments": { | 
|  | "device": "job0" | 
|  | } | 
|  | } | 
|  | { | 
|  | "return": {} | 
|  | } | 
|  |  | 
|  | Finally, once the above job is completed, an event | 
|  | ``BLOCK_JOB_COMPLETED`` will be emitted. | 
|  |  | 
|  | .. note:: | 
|  | The invocation for rest of the cases (2, 4, and 5), discussed in the | 
|  | previous section, is omitted for brevity. | 
|  |  | 
|  |  | 
|  | Live disk synchronization --- ``drive-mirror`` and ``blockdev-mirror`` | 
|  | ---------------------------------------------------------------------- | 
|  |  | 
|  | Synchronize a running disk image chain (all or part of it) to a target | 
|  | image. | 
|  |  | 
|  | Again, given our familiar disk image chain:: | 
|  |  | 
|  | [A] <-- [B] <-- [C] <-- [D] | 
|  |  | 
|  | The ``drive-mirror`` (and its newer equivalent ``blockdev-mirror``) | 
|  | allows you to copy data from the entire chain into a single target image | 
|  | (which can be located on a different host), [E]. | 
|  |  | 
|  | .. note:: | 
|  |  | 
|  | When you cancel an in-progress 'mirror' job *before* the source and | 
|  | target are synchronized, ``block-job-cancel`` will emit the event | 
|  | ``BLOCK_JOB_CANCELLED``.  However, note that if you cancel a | 
|  | 'mirror' job *after* it has indicated (via the event | 
|  | ``BLOCK_JOB_READY``) that the source and target have reached | 
|  | synchronization, then the event emitted by ``block-job-cancel`` | 
|  | changes to ``BLOCK_JOB_COMPLETED``. | 
|  |  | 
|  | Besides the 'mirror' job, the "active ``block-commit``" is the only | 
|  | other block device job that emits the event ``BLOCK_JOB_READY``. | 
|  | The rest of the block device jobs ('stream', "non-active | 
|  | ``block-commit``", and 'backup') end automatically. | 
|  |  | 
|  | So there are two possible actions to take, after a 'mirror' job has | 
|  | emitted the event ``BLOCK_JOB_READY``, indicating that the source and | 
|  | target have reached synchronization: | 
|  |  | 
|  | (1) Issuing the command ``block-job-cancel`` (after it emits the event | 
|  | ``BLOCK_JOB_COMPLETED``) will create a point-in-time (which is at | 
|  | the time of *triggering* the cancel command) copy of the entire disk | 
|  | image chain (or only the top-most image, depending on the ``sync`` | 
|  | mode), contained in the target image [E]. One use case for this is | 
|  | live VM migration with non-shared storage. | 
|  |  | 
|  | (2) Issuing the command ``block-job-complete`` (after it emits the event | 
|  | ``BLOCK_JOB_COMPLETED``) will adjust the guest device (i.e. live | 
|  | QEMU) to point to the target image, [E], causing all the new writes | 
|  | from this point on to happen there. | 
|  |  | 
|  | About synchronization modes: The synchronization mode determines | 
|  | *which* part of the disk image chain will be copied to the target. | 
|  | Currently, there are four different kinds: | 
|  |  | 
|  | (1) ``full`` -- Synchronize the content of entire disk image chain to | 
|  | the target | 
|  |  | 
|  | (2) ``top`` -- Synchronize only the contents of the top-most disk image | 
|  | in the chain to the target | 
|  |  | 
|  | (3) ``none`` -- Synchronize only the new writes from this point on. | 
|  |  | 
|  | .. note:: In the case of ``blockdev-backup`` (or deprecated | 
|  | ``drive-backup``), the behavior of ``none`` | 
|  | synchronization mode is different.  Normally, a | 
|  | ``backup`` job consists of two parts: Anything that is | 
|  | overwritten by the guest is first copied out to the | 
|  | backup, and in the background the whole image is copied | 
|  | from start to end. With ``sync=none``, it's only the | 
|  | first part. | 
|  |  | 
|  | (4) ``incremental`` -- Synchronize content that is described by the | 
|  | dirty bitmap | 
|  |  | 
|  | .. note:: | 
|  | Refer to the :doc:`bitmaps` document in the QEMU source | 
|  | tree to learn about the detailed workings of the ``incremental`` | 
|  | synchronization mode. | 
|  |  | 
|  |  | 
|  | QMP invocation for ``drive-mirror`` | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | To copy the contents of the entire disk image chain, from [A] all the | 
|  | way to [D], to a new target (``drive-mirror`` will create the destination | 
|  | file, if it doesn't already exist), call it [E]:: | 
|  |  | 
|  | (QEMU) drive-mirror device=node-D target=e.qcow2 sync=full job-id=job0 | 
|  | { | 
|  | "execute": "drive-mirror", | 
|  | "arguments": { | 
|  | "device": "node-D", | 
|  | "job-id": "job0", | 
|  | "target": "e.qcow2", | 
|  | "sync": "full" | 
|  | } | 
|  | } | 
|  |  | 
|  | The ``"sync": "full"``, from the above, means: copy the *entire* chain | 
|  | to the destination. | 
|  |  | 
|  | Following the above, querying for active block jobs will show that a | 
|  | 'mirror' job is "ready" to be completed (and QEMU will also emit an | 
|  | event, ``BLOCK_JOB_READY``):: | 
|  |  | 
|  | (QEMU) query-block-jobs | 
|  | { | 
|  | "execute": "query-block-jobs", | 
|  | "arguments": {} | 
|  | } | 
|  | { | 
|  | "return": [ | 
|  | { | 
|  | "busy": false, | 
|  | "type": "mirror", | 
|  | "len": 21757952, | 
|  | "paused": false, | 
|  | "ready": true, | 
|  | "io-status": "ok", | 
|  | "offset": 21757952, | 
|  | "device": "job0", | 
|  | "speed": 0 | 
|  | } | 
|  | ] | 
|  | } | 
|  |  | 
|  | And, as noted in the previous section, there are two possible actions | 
|  | at this point: | 
|  |  | 
|  | (a) Create a point-in-time snapshot by ending the synchronization.  The | 
|  | point-in-time is at the time of *ending* the sync.  (The result of | 
|  | the following being: the target image, [E], will be populated with | 
|  | content from the entire chain, [A] to [D]):: | 
|  |  | 
|  | (QEMU) block-job-cancel device=job0 | 
|  | { | 
|  | "execute": "block-job-cancel", | 
|  | "arguments": { | 
|  | "device": "job0" | 
|  | } | 
|  | } | 
|  |  | 
|  | (b) Or, complete the operation and pivot the live QEMU to the target | 
|  | copy:: | 
|  |  | 
|  | (QEMU) block-job-complete device=job0 | 
|  |  | 
|  | In either of the above cases, if you once again run the | 
|  | ``query-block-jobs`` command, there should not be any active block | 
|  | operation. | 
|  |  | 
|  | Comparing 'commit' and 'mirror': In both then cases, the overlay images | 
|  | can be discarded.  However, with 'commit', the *existing* base image | 
|  | will be modified (by updating it with contents from overlays); while in | 
|  | the case of 'mirror', a *new* target image is populated with the data | 
|  | from the disk image chain. | 
|  |  | 
|  |  | 
|  | QMP invocation for live storage migration with ``drive-mirror`` + NBD | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | Live storage migration (without shared storage setup) is one of the most | 
|  | common use-cases that takes advantage of the ``drive-mirror`` primitive | 
|  | and QEMU's built-in Network Block Device (NBD) server.  Here's a quick | 
|  | walk-through of this setup. | 
|  |  | 
|  | Given the disk image chain:: | 
|  |  | 
|  | [A] <-- [B] <-- [C] <-- [D] | 
|  |  | 
|  | Instead of copying content from the entire chain, synchronize *only* the | 
|  | contents of the *top*-most disk image (i.e. the active layer), [D], to a | 
|  | target, say, [TargetDisk]. | 
|  |  | 
|  | .. important:: | 
|  | The destination host must already have the contents of the backing | 
|  | chain, involving images [A], [B], and [C], visible via other means | 
|  | -- whether by ``cp``, ``rsync``, or by some storage array-specific | 
|  | command.) | 
|  |  | 
|  | Sometimes, this is also referred to as "shallow copy" -- because only | 
|  | the "active layer", and not the rest of the image chain, is copied to | 
|  | the destination. | 
|  |  | 
|  | .. note:: | 
|  | In this example, for the sake of simplicity, we'll be using the same | 
|  | ``localhost`` as both source and destination. | 
|  |  | 
|  | As noted earlier, on the destination host the contents of the backing | 
|  | chain -- from images [A] to [C] -- are already expected to exist in some | 
|  | form (e.g. in a file called, ``Contents-of-A-B-C.qcow2``).  Now, on the | 
|  | destination host, let's create a target overlay image (with the image | 
|  | ``Contents-of-A-B-C.qcow2`` as its backing file), to which the contents | 
|  | of image [D] (from the source QEMU) will be mirrored to:: | 
|  |  | 
|  | $ qemu-img create -f qcow2 -b ./Contents-of-A-B-C.qcow2 \ | 
|  | -F qcow2 ./target-disk.qcow2 | 
|  |  | 
|  | And start the destination QEMU (we already have the source QEMU running | 
|  | -- discussed in the section: `Interacting with a QEMU instance`_) | 
|  | instance, with the following invocation.  (As noted earlier, for | 
|  | simplicity's sake, the destination QEMU is started on the same host, but | 
|  | it could be located elsewhere): | 
|  |  | 
|  | .. parsed-literal:: | 
|  |  | 
|  | $ |qemu_system| -display none -no-user-config -nodefaults \\ | 
|  | -m 512 -blockdev \\ | 
|  | node-name=node-TargetDisk,driver=qcow2,file.driver=file,file.node-name=file,file.filename=./target-disk.qcow2 \\ | 
|  | -device virtio-blk,drive=node-TargetDisk,id=virtio0 \\ | 
|  | -S -monitor stdio -qmp unix:./qmp-sock2,server=on,wait=off \\ | 
|  | -incoming tcp:localhost:6666 | 
|  |  | 
|  | Given the disk image chain on source QEMU:: | 
|  |  | 
|  | [A] <-- [B] <-- [C] <-- [D] | 
|  |  | 
|  | On the destination host, it is expected that the contents of the chain | 
|  | ``[A] <-- [B] <-- [C]`` are *already* present, and therefore copy *only* | 
|  | the content of image [D]. | 
|  |  | 
|  | (1) [On *destination* QEMU] As part of the first step, start the | 
|  | built-in NBD server on a given host (local host, represented by | 
|  | ``::``)and port:: | 
|  |  | 
|  | (QEMU) nbd-server-start addr={"type":"inet","data":{"host":"::","port":"49153"}} | 
|  | { | 
|  | "execute": "nbd-server-start", | 
|  | "arguments": { | 
|  | "addr": { | 
|  | "data": { | 
|  | "host": "::", | 
|  | "port": "49153" | 
|  | }, | 
|  | "type": "inet" | 
|  | } | 
|  | } | 
|  | } | 
|  |  | 
|  | (2) [On *destination* QEMU] And export the destination disk image using | 
|  | QEMU's built-in NBD server:: | 
|  |  | 
|  | (QEMU) nbd-server-add device=node-TargetDisk writable=true | 
|  | { | 
|  | "execute": "nbd-server-add", | 
|  | "arguments": { | 
|  | "device": "node-TargetDisk" | 
|  | } | 
|  | } | 
|  |  | 
|  | (3) [On *source* QEMU] Then, invoke ``drive-mirror`` (NB: since we're | 
|  | running ``drive-mirror`` with ``mode=existing`` (meaning: | 
|  | synchronize to a pre-created file, therefore 'existing', file on the | 
|  | target host), with the synchronization mode as 'top' (``"sync: | 
|  | "top"``):: | 
|  |  | 
|  | (QEMU) drive-mirror device=node-D target=nbd:localhost:49153:exportname=node-TargetDisk sync=top mode=existing job-id=job0 | 
|  | { | 
|  | "execute": "drive-mirror", | 
|  | "arguments": { | 
|  | "device": "node-D", | 
|  | "mode": "existing", | 
|  | "job-id": "job0", | 
|  | "target": "nbd:localhost:49153:exportname=node-TargetDisk", | 
|  | "sync": "top" | 
|  | } | 
|  | } | 
|  |  | 
|  | (4) [On *source* QEMU] Once ``drive-mirror`` copies the entire data, and the | 
|  | event ``BLOCK_JOB_READY`` is emitted, issue ``block-job-cancel`` to | 
|  | gracefully end the synchronization, from source QEMU:: | 
|  |  | 
|  | (QEMU) block-job-cancel device=job0 | 
|  | { | 
|  | "execute": "block-job-cancel", | 
|  | "arguments": { | 
|  | "device": "job0" | 
|  | } | 
|  | } | 
|  |  | 
|  | (5) [On *destination* QEMU] Then, stop the NBD server:: | 
|  |  | 
|  | (QEMU) nbd-server-stop | 
|  | { | 
|  | "execute": "nbd-server-stop", | 
|  | "arguments": {} | 
|  | } | 
|  |  | 
|  | (6) [On *destination* QEMU] Finally, resume the guest vCPUs by issuing the | 
|  | QMP command ``cont``:: | 
|  |  | 
|  | (QEMU) cont | 
|  | { | 
|  | "execute": "cont", | 
|  | "arguments": {} | 
|  | } | 
|  |  | 
|  | .. note:: | 
|  | Higher-level libraries (e.g. libvirt) automate the entire above | 
|  | process (although note that libvirt does not allow same-host | 
|  | migrations to localhost for other reasons). | 
|  |  | 
|  |  | 
|  | Notes on ``blockdev-mirror`` | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | The ``blockdev-mirror`` command is equivalent in core functionality to | 
|  | ``drive-mirror``, except that it operates at node-level in a BDS graph. | 
|  |  | 
|  | Also: for ``blockdev-mirror``, the 'target' image needs to be explicitly | 
|  | created (using ``qemu-img``) and attach it to live QEMU via | 
|  | ``blockdev-add``, which assigns a name to the to-be created target node. | 
|  |  | 
|  | E.g. the sequence of actions to create a point-in-time backup of an | 
|  | entire disk image chain, to a target, using ``blockdev-mirror`` would be: | 
|  |  | 
|  | (0) Create the QCOW2 overlays, to arrive at a backing chain of desired | 
|  | depth | 
|  |  | 
|  | (1) Create the target image (using ``qemu-img``), say, ``e.qcow2`` | 
|  |  | 
|  | (2) Attach the above created file (``e.qcow2``), run-time, using | 
|  | ``blockdev-add`` to QEMU | 
|  |  | 
|  | (3) Perform ``blockdev-mirror`` (use ``"sync": "full"`` to copy the | 
|  | entire chain to the target).  And notice the event | 
|  | ``BLOCK_JOB_READY`` | 
|  |  | 
|  | (4) Optionally, query for active block jobs, there should be a 'mirror' | 
|  | job ready to be completed | 
|  |  | 
|  | (5) Gracefully complete the 'mirror' block device job, and notice the | 
|  | event ``BLOCK_JOB_COMPLETED`` | 
|  |  | 
|  | (6) Shutdown the guest by issuing the QMP ``quit`` command so that | 
|  | caches are flushed | 
|  |  | 
|  | (7) Then, finally, compare the contents of the disk image chain, and | 
|  | the target copy with ``qemu-img compare``.  You should notice: | 
|  | "Images are identical" | 
|  |  | 
|  |  | 
|  | QMP invocation for ``blockdev-mirror`` | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | Given the disk image chain:: | 
|  |  | 
|  | [A] <-- [B] <-- [C] <-- [D] | 
|  |  | 
|  | To copy the contents of the entire disk image chain, from [A] all the | 
|  | way to [D], to a new target, call it [E].  The following is the flow. | 
|  |  | 
|  | Create the overlay images, [B], [C], and [D]:: | 
|  |  | 
|  | (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2 | 
|  | (QEMU) blockdev-snapshot-sync node-name=node-B snapshot-file=c.qcow2 snapshot-node-name=node-C format=qcow2 | 
|  | (QEMU) blockdev-snapshot-sync node-name=node-C snapshot-file=d.qcow2 snapshot-node-name=node-D format=qcow2 | 
|  |  | 
|  | Create the target image, [E]:: | 
|  |  | 
|  | $ qemu-img create -f qcow2 e.qcow2 39M | 
|  |  | 
|  | Add the above created target image to QEMU, via ``blockdev-add``:: | 
|  |  | 
|  | (QEMU) blockdev-add driver=qcow2 node-name=node-E file={"driver":"file","filename":"e.qcow2"} | 
|  | { | 
|  | "execute": "blockdev-add", | 
|  | "arguments": { | 
|  | "node-name": "node-E", | 
|  | "driver": "qcow2", | 
|  | "file": { | 
|  | "driver": "file", | 
|  | "filename": "e.qcow2" | 
|  | } | 
|  | } | 
|  | } | 
|  |  | 
|  | Perform ``blockdev-mirror``, and notice the event ``BLOCK_JOB_READY``:: | 
|  |  | 
|  | (QEMU) blockdev-mirror device=node-B target=node-E sync=full job-id=job0 | 
|  | { | 
|  | "execute": "blockdev-mirror", | 
|  | "arguments": { | 
|  | "device": "node-D", | 
|  | "job-id": "job0", | 
|  | "target": "node-E", | 
|  | "sync": "full" | 
|  | } | 
|  | } | 
|  |  | 
|  | Query for active block jobs, there should be a 'mirror' job ready:: | 
|  |  | 
|  | (QEMU) query-block-jobs | 
|  | { | 
|  | "execute": "query-block-jobs", | 
|  | "arguments": {} | 
|  | } | 
|  | { | 
|  | "return": [ | 
|  | { | 
|  | "busy": false, | 
|  | "type": "mirror", | 
|  | "len": 21561344, | 
|  | "paused": false, | 
|  | "ready": true, | 
|  | "io-status": "ok", | 
|  | "offset": 21561344, | 
|  | "device": "job0", | 
|  | "speed": 0 | 
|  | } | 
|  | ] | 
|  | } | 
|  |  | 
|  | Gracefully complete the block device job operation, and notice the | 
|  | event ``BLOCK_JOB_COMPLETED``:: | 
|  |  | 
|  | (QEMU) block-job-complete device=job0 | 
|  | { | 
|  | "execute": "block-job-complete", | 
|  | "arguments": { | 
|  | "device": "job0" | 
|  | } | 
|  | } | 
|  | { | 
|  | "return": {} | 
|  | } | 
|  |  | 
|  | Shutdown the guest, by issuing the ``quit`` QMP command:: | 
|  |  | 
|  | (QEMU) quit | 
|  | { | 
|  | "execute": "quit", | 
|  | "arguments": {} | 
|  | } | 
|  |  | 
|  |  | 
|  | Live disk backup --- ``blockdev-backup`` and the deprecated ``drive-backup`` | 
|  | ---------------------------------------------------------------------------- | 
|  |  | 
|  | The ``blockdev-backup`` (and the deprecated ``drive-backup``) allows | 
|  | you to create a point-in-time snapshot. | 
|  |  | 
|  | In this case, the point-in-time is when you *start* the | 
|  | ``blockdev-backup`` (or deprecated ``drive-backup``) command. | 
|  |  | 
|  |  | 
|  | QMP invocation for ``drive-backup`` | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | Note that ``drive-backup`` command is deprecated since QEMU 6.2 and | 
|  | will be removed in future. | 
|  |  | 
|  | Yet again, starting afresh with our example disk image chain:: | 
|  |  | 
|  | [A] <-- [B] <-- [C] <-- [D] | 
|  |  | 
|  | To create a target image [E], with content populated from image [A] to | 
|  | [D], from the above chain, the following is the syntax.  (If the target | 
|  | image does not exist, ``drive-backup`` will create it):: | 
|  |  | 
|  | (QEMU) drive-backup device=node-D sync=full target=e.qcow2 job-id=job0 | 
|  | { | 
|  | "execute": "drive-backup", | 
|  | "arguments": { | 
|  | "device": "node-D", | 
|  | "job-id": "job0", | 
|  | "sync": "full", | 
|  | "target": "e.qcow2" | 
|  | } | 
|  | } | 
|  |  | 
|  | Once the above ``drive-backup`` has completed, a ``BLOCK_JOB_COMPLETED`` event | 
|  | will be issued, indicating the live block device job operation has | 
|  | completed, and no further action is required. | 
|  |  | 
|  |  | 
|  | Moving from the deprecated ``drive-backup`` to newer ``blockdev-backup`` | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | ``blockdev-backup`` differs from ``drive-backup`` in how you specify | 
|  | the backup target. With ``blockdev-backup`` you can't specify filename | 
|  | as a target.  Instead you use ``node-name`` of existing block node, | 
|  | which you may add by ``blockdev-add`` or ``blockdev-create`` commands. | 
|  | Correspondingly, ``blockdev-backup`` doesn't have ``mode`` and | 
|  | ``format`` arguments which don't apply to an existing block node. See | 
|  | following sections for details and examples. | 
|  |  | 
|  |  | 
|  | Notes on ``blockdev-backup`` | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | The ``blockdev-backup`` command operates at node-level in a Block Driver | 
|  | State (BDS) graph. | 
|  |  | 
|  | E.g. the sequence of actions to create a point-in-time backup | 
|  | of an entire disk image chain, to a target, using ``blockdev-backup`` | 
|  | would be: | 
|  |  | 
|  | (0) Create the QCOW2 overlays, to arrive at a backing chain of desired | 
|  | depth | 
|  |  | 
|  | (1) Create the target image (using ``qemu-img``), say, ``e.qcow2`` | 
|  |  | 
|  | (2) Attach the above created file (``e.qcow2``), run-time, using | 
|  | ``blockdev-add`` to QEMU | 
|  |  | 
|  | (3) Perform ``blockdev-backup`` (use ``"sync": "full"`` to copy the | 
|  | entire chain to the target).  And notice the event | 
|  | ``BLOCK_JOB_COMPLETED`` | 
|  |  | 
|  | (4) Shutdown the guest, by issuing the QMP ``quit`` command, so that | 
|  | caches are flushed | 
|  |  | 
|  | (5) Then, finally, compare the contents of the disk image chain, and | 
|  | the target copy with ``qemu-img compare``.  You should notice: | 
|  | "Images are identical" | 
|  |  | 
|  | The following section shows an example QMP invocation for | 
|  | ``blockdev-backup``. | 
|  |  | 
|  | QMP invocation for ``blockdev-backup`` | 
|  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  |  | 
|  | Given a disk image chain of depth 1 where image [B] is the active | 
|  | overlay (live QEMU is writing to it):: | 
|  |  | 
|  | [A] <-- [B] | 
|  |  | 
|  | The following is the procedure to copy the content from the entire chain | 
|  | to a target image (say, [E]), which has the full content from [A] and | 
|  | [B]. | 
|  |  | 
|  | Create the overlay [B]:: | 
|  |  | 
|  | (QEMU) blockdev-snapshot-sync node-name=node-A snapshot-file=b.qcow2 snapshot-node-name=node-B format=qcow2 | 
|  | { | 
|  | "execute": "blockdev-snapshot-sync", | 
|  | "arguments": { | 
|  | "node-name": "node-A", | 
|  | "snapshot-file": "b.qcow2", | 
|  | "format": "qcow2", | 
|  | "snapshot-node-name": "node-B" | 
|  | } | 
|  | } | 
|  |  | 
|  |  | 
|  | Create a target image that will contain the copy:: | 
|  |  | 
|  | $ qemu-img create -f qcow2 e.qcow2 39M | 
|  |  | 
|  | Then add it to QEMU via ``blockdev-add``:: | 
|  |  | 
|  | (QEMU) blockdev-add driver=qcow2 node-name=node-E file={"driver":"file","filename":"e.qcow2"} | 
|  | { | 
|  | "execute": "blockdev-add", | 
|  | "arguments": { | 
|  | "node-name": "node-E", | 
|  | "driver": "qcow2", | 
|  | "file": { | 
|  | "driver": "file", | 
|  | "filename": "e.qcow2" | 
|  | } | 
|  | } | 
|  | } | 
|  |  | 
|  | Then invoke ``blockdev-backup`` to copy the contents from the entire | 
|  | image chain, consisting of images [A] and [B] to the target image | 
|  | 'e.qcow2':: | 
|  |  | 
|  | (QEMU) blockdev-backup device=node-B target=node-E sync=full job-id=job0 | 
|  | { | 
|  | "execute": "blockdev-backup", | 
|  | "arguments": { | 
|  | "device": "node-B", | 
|  | "job-id": "job0", | 
|  | "target": "node-E", | 
|  | "sync": "full" | 
|  | } | 
|  | } | 
|  |  | 
|  | Once the above 'backup' operation has completed, the event, | 
|  | ``BLOCK_JOB_COMPLETED`` will be emitted, signalling successful | 
|  | completion. | 
|  |  | 
|  | Next, query for any active block device jobs (there should be none):: | 
|  |  | 
|  | (QEMU) query-block-jobs | 
|  | { | 
|  | "execute": "query-block-jobs", | 
|  | "arguments": {} | 
|  | } | 
|  |  | 
|  | Shutdown the guest:: | 
|  |  | 
|  | (QEMU) quit | 
|  | { | 
|  | "execute": "quit", | 
|  | "arguments": {} | 
|  | } | 
|  | "return": {} | 
|  | } | 
|  |  | 
|  | .. note:: | 
|  | The above step is really important; if forgotten, an error, "Failed | 
|  | to get shared "write" lock on e.qcow2", will be thrown when you do | 
|  | ``qemu-img compare`` to verify the integrity of the disk image | 
|  | with the backup content. | 
|  |  | 
|  |  | 
|  | The end result will be the image 'e.qcow2' containing a | 
|  | point-in-time backup of the disk image chain -- i.e. contents from | 
|  | images [A] and [B] at the time the ``blockdev-backup`` command was | 
|  | initiated. | 
|  |  | 
|  | One way to confirm the backup disk image contains the identical content | 
|  | with the disk image chain is to compare the backup and the contents of | 
|  | the chain, you should see "Images are identical".  (NB: this is assuming | 
|  | QEMU was launched with ``-S`` option, which will not start the CPUs at | 
|  | guest boot up):: | 
|  |  | 
|  | $ qemu-img compare b.qcow2 e.qcow2 | 
|  | Warning: Image size mismatch! | 
|  | Images are identical. | 
|  |  | 
|  | NOTE: The "Warning: Image size mismatch!" is expected, as we created the | 
|  | target image (e.qcow2) with 39M size. |