Orit Wasserman | 34c2641 | 2012-08-06 21:42:49 +0300 | [diff] [blame] | 1 | XBZRLE (Xor Based Zero Run Length Encoding) |
| 2 | =========================================== |
| 3 | |
| 4 | Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction |
| 5 | of VM downtime and the total live-migration time of Virtual machines. |
| 6 | It is particularly useful for virtual machines running memory write intensive |
| 7 | workloads that are typical of large enterprise applications such as SAP ERP |
| 8 | Systems, and generally speaking for any application that uses a sparse memory |
| 9 | update pattern. |
| 10 | |
| 11 | Instead of sending the changed guest memory page this solution will send a |
| 12 | compressed version of the updates, thus reducing the amount of data sent during |
| 13 | live migration. |
| 14 | In order to be able to calculate the update, the previous memory pages need to |
| 15 | be stored on the source. Those pages are stored in a dedicated cache |
| 16 | (hash table) and are accessed by their address. |
| 17 | The larger the cache size the better the chances are that the page has already |
| 18 | been stored in the cache. |
| 19 | A small cache size will result in high cache miss rate. |
| 20 | Cache size can be changed before and during migration. |
| 21 | |
| 22 | Format |
| 23 | ======= |
| 24 | |
| 25 | The compression format performs a XOR between the previous and current content |
| 26 | of the page, where zero represents an unchanged value. |
| 27 | The page data delta is represented by zero and non zero runs. |
| 28 | A zero run is represented by its length (in bytes). |
| 29 | A non zero run is represented by its length (in bytes) and the new data. |
| 30 | The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128) |
| 31 | |
| 32 | There can be more than one valid encoding, the sender may send a longer encoding |
| 33 | for the benefit of reducing computation cost. |
| 34 | |
| 35 | page = zrun nzrun |
| 36 | | zrun nzrun page |
| 37 | |
| 38 | zrun = length |
| 39 | |
| 40 | nzrun = length byte... |
| 41 | |
| 42 | length = uleb128 encoded integer |
| 43 | |
| 44 | On the sender side XBZRLE is used as a compact delta encoding of page updates, |
Cao jin | 7c2b0f6 | 2016-09-08 19:13:58 +0800 | [diff] [blame] | 45 | retrieving the old page content from the cache (default size of 64MB). The |
Orit Wasserman | 34c2641 | 2012-08-06 21:42:49 +0300 | [diff] [blame] | 46 | receiving side uses the existing page's content and XBZRLE to decode the new |
| 47 | page's content. |
| 48 | |
| 49 | This work was originally based on research results published |
| 50 | VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live |
| 51 | Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth. |
| 52 | Additionally the delta encoder XBRLE was improved further using the XBZRLE |
| 53 | instead. |
| 54 | |
| 55 | XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it |
| 56 | ideal for in-line, real-time encoding such as is needed for live-migration. |
| 57 | |
| 58 | Example |
| 59 | old buffer: |
| 60 | 1001 zeros |
| 61 | 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 68 00 00 6b 00 6d |
| 62 | 3074 zeros |
| 63 | |
| 64 | new buffer: |
| 65 | 1001 zeros |
| 66 | 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 68 00 00 67 00 69 |
| 67 | 3074 zeros |
| 68 | |
| 69 | encoded buffer: |
| 70 | |
| 71 | encoded length 24 |
| 72 | e9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69 |
| 73 | |
ChenLiang | 27af7d6 | 2014-11-24 19:55:47 +0800 | [diff] [blame] | 74 | Cache update strategy |
| 75 | ===================== |
Cao jin | 7c2b0f6 | 2016-09-08 19:13:58 +0800 | [diff] [blame] | 76 | Keeping the hot pages in the cache is effective for decreasing cache |
ChenLiang | 27af7d6 | 2014-11-24 19:55:47 +0800 | [diff] [blame] | 77 | misses. XBZRLE uses a counter as the age of each page. The counter will |
| 78 | increase after each ram dirty bitmap sync. When a cache conflict is |
| 79 | detected, XBZRLE will only evict pages in the cache that are older than |
| 80 | a threshold. |
| 81 | |
Orit Wasserman | 34c2641 | 2012-08-06 21:42:49 +0300 | [diff] [blame] | 82 | Usage |
| 83 | ====================== |
| 84 | 1. Verify the destination QEMU version is able to decode the new format. |
| 85 | {qemu} info migrate_capabilities |
| 86 | {qemu} xbzrle: off , ... |
| 87 | |
| 88 | 2. Activate xbzrle on both source and destination: |
| 89 | {qemu} migrate_set_capability xbzrle on |
| 90 | |
| 91 | 3. Set the XBZRLE cache size - the cache size is in MBytes and should be a |
| 92 | power of 2. The cache default value is 64MBytes. (on source only) |
| 93 | {qemu} migrate_set_cache_size 256m |
| 94 | |
| 95 | 4. Start outgoing migration |
| 96 | {qemu} migrate -d tcp:destination.host:4444 |
| 97 | {qemu} info migrate |
| 98 | capabilities: xbzrle: on |
| 99 | Migration status: active |
| 100 | transferred ram: A kbytes |
| 101 | remaining ram: B kbytes |
| 102 | total ram: C kbytes |
| 103 | total time: D milliseconds |
| 104 | duplicate: E pages |
| 105 | normal: F pages |
| 106 | normal bytes: G kbytes |
| 107 | cache size: H bytes |
| 108 | xbzrle transferred: I kbytes |
| 109 | xbzrle pages: J pages |
| 110 | xbzrle cache miss: K |
| 111 | xbzrle overflow : L |
| 112 | |
| 113 | xbzrle cache-miss: the number of cache misses to date - high cache-miss rate |
| 114 | indicates that the cache size is set too low. |
| 115 | xbzrle overflow: the number of overflows in the decoding which where the delta |
| 116 | could not be compressed. This can happen if the changes in the pages are too |
| 117 | large or there are many short changes; for example, changing every second byte |
| 118 | (half a page). |
| 119 | |
| 120 | Testing: Testing indicated that live migration with XBZRLE was completed in 110 |
| 121 | seconds, whereas without it would not be able to complete. |
| 122 | |
| 123 | A simple synthetic memory r/w load generator: |
| 124 | .. include <stdlib.h> |
| 125 | .. include <stdio.h> |
| 126 | .. int main() |
| 127 | .. { |
| 128 | .. char *buf = (char *) calloc(4096, 4096); |
| 129 | .. while (1) { |
| 130 | .. int i; |
| 131 | .. for (i = 0; i < 4096 * 4; i++) { |
| 132 | .. buf[i * 4096 / 4]++; |
| 133 | .. } |
| 134 | .. printf("."); |
| 135 | .. } |
| 136 | .. } |