Stefan Hajnoczi | d9d3341 | 2010-09-21 15:43:03 +0100 | [diff] [blame] | 1 | = Block driver correctness testing with blkverify = |
| 2 | |
| 3 | == Introduction == |
| 4 | |
| 5 | This document describes how to use the blkverify protocol to test that a block |
| 6 | driver is operating correctly. |
| 7 | |
| 8 | It is difficult to test and debug block drivers against real guests. Often |
| 9 | processes inside the guest will crash because corrupt sectors were read as part |
| 10 | of the executable. Other times obscure errors are raised by a program inside |
| 11 | the guest. These issues are extremely hard to trace back to bugs in the block |
| 12 | driver. |
| 13 | |
| 14 | Blkverify solves this problem by catching data corruption inside QEMU the first |
| 15 | time bad data is read and reporting the disk sector that is corrupted. |
| 16 | |
| 17 | == How it works == |
| 18 | |
| 19 | The blkverify protocol has two child block devices, the "test" device and the |
| 20 | "raw" device. Read/write operations are mirrored to both devices so their |
| 21 | state should always be in sync. |
| 22 | |
| 23 | The "raw" device is a raw image, a flat file, that has identical starting |
| 24 | contents to the "test" image. The idea is that the "raw" device will handle |
| 25 | read/write operations correctly and not corrupt data. It can be used as a |
| 26 | reference for comparison against the "test" device. |
| 27 | |
| 28 | After a mirrored read operation completes, blkverify will compare the data and |
| 29 | raise an error if it is not identical. This makes it possible to catch the |
| 30 | first instance where corrupt data is read. |
| 31 | |
| 32 | == Example == |
| 33 | |
| 34 | Imagine raw.img has 0xcd repeated throughout its first sector: |
| 35 | |
| 36 | $ ./qemu-io -c 'read -v 0 512' raw.img |
| 37 | 00000000: cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd ................ |
| 38 | 00000010: cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd ................ |
| 39 | [...] |
| 40 | 000001e0: cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd ................ |
| 41 | 000001f0: cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd ................ |
| 42 | read 512/512 bytes at offset 0 |
| 43 | 512.000000 bytes, 1 ops; 0.0000 sec (97.656 MiB/sec and 200000.0000 ops/sec) |
| 44 | |
| 45 | And test.img is corrupt, its first sector is zeroed when it shouldn't be: |
| 46 | |
| 47 | $ ./qemu-io -c 'read -v 0 512' test.img |
| 48 | 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ |
| 49 | 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ |
| 50 | [...] |
| 51 | 000001e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ |
| 52 | 000001f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ |
| 53 | read 512/512 bytes at offset 0 |
| 54 | 512.000000 bytes, 1 ops; 0.0000 sec (81.380 MiB/sec and 166666.6667 ops/sec) |
| 55 | |
| 56 | This error is caught by blkverify: |
| 57 | |
| 58 | $ ./qemu-io -c 'read 0 512' blkverify:a.img:b.img |
| 59 | blkverify: read sector_num=0 nb_sectors=4 contents mismatch in sector 0 |
| 60 | |
| 61 | A more realistic scenario is verifying the installation of a guest OS: |
| 62 | |
| 63 | $ ./qemu-img create raw.img 16G |
| 64 | $ ./qemu-img create -f qcow2 test.qcow2 16G |
| 65 | $ x86_64-softmmu/qemu-system-x86_64 -cdrom debian.iso \ |
| 66 | -drive file=blkverify:raw.img:test.qcow2 |
| 67 | |
| 68 | If the installation is aborted when blkverify detects corruption, use qemu-io |
| 69 | to explore the contents of the disk image at the sector in question. |