| XBZRLE (Xor Based Zero Run Length Encoding) |
| =========================================== |
| |
| Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction |
| of VM downtime and the total live-migration time of Virtual machines. |
| It is particularly useful for virtual machines running memory write intensive |
| workloads that are typical of large enterprise applications such as SAP ERP |
| Systems, and generally speaking for any application that uses a sparse memory |
| update pattern. |
| |
| Instead of sending the changed guest memory page this solution will send a |
| compressed version of the updates, thus reducing the amount of data sent during |
| live migration. |
| In order to be able to calculate the update, the previous memory pages need to |
| be stored on the source. Those pages are stored in a dedicated cache |
| (hash table) and are accessed by their address. |
| The larger the cache size the better the chances are that the page has already |
| been stored in the cache. |
| A small cache size will result in high cache miss rate. |
| Cache size can be changed before and during migration. |
| |
| Format |
| ======= |
| |
| The compression format performs a XOR between the previous and current content |
| of the page, where zero represents an unchanged value. |
| The page data delta is represented by zero and non zero runs. |
| A zero run is represented by its length (in bytes). |
| A non zero run is represented by its length (in bytes) and the new data. |
| The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128) |
| |
| There can be more than one valid encoding, the sender may send a longer encoding |
| for the benefit of reducing computation cost. |
| |
| page = zrun nzrun |
| | zrun nzrun page |
| |
| zrun = length |
| |
| nzrun = length byte... |
| |
| length = uleb128 encoded integer |
| |
| On the sender side XBZRLE is used as a compact delta encoding of page updates, |
| retrieving the old page content from the cache (default size of 512 MB). The |
| receiving side uses the existing page's content and XBZRLE to decode the new |
| page's content. |
| |
| This work was originally based on research results published |
| VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live |
| Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth. |
| Additionally the delta encoder XBRLE was improved further using the XBZRLE |
| instead. |
| |
| XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it |
| ideal for in-line, real-time encoding such as is needed for live-migration. |
| |
| Example |
| old buffer: |
| 1001 zeros |
| 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 68 00 00 6b 00 6d |
| 3074 zeros |
| |
| new buffer: |
| 1001 zeros |
| 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 68 00 00 67 00 69 |
| 3074 zeros |
| |
| encoded buffer: |
| |
| encoded length 24 |
| e9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69 |
| |
| Cache update strategy |
| ===================== |
| Keeping the hot pages in the cache is effective for decreased cache |
| misses. XBZRLE uses a counter as the age of each page. The counter will |
| increase after each ram dirty bitmap sync. When a cache conflict is |
| detected, XBZRLE will only evict pages in the cache that are older than |
| a threshold. |
| |
| Usage |
| ====================== |
| 1. Verify the destination QEMU version is able to decode the new format. |
| {qemu} info migrate_capabilities |
| {qemu} xbzrle: off , ... |
| |
| 2. Activate xbzrle on both source and destination: |
| {qemu} migrate_set_capability xbzrle on |
| |
| 3. Set the XBZRLE cache size - the cache size is in MBytes and should be a |
| power of 2. The cache default value is 64MBytes. (on source only) |
| {qemu} migrate_set_cache_size 256m |
| |
| 4. Start outgoing migration |
| {qemu} migrate -d tcp:destination.host:4444 |
| {qemu} info migrate |
| capabilities: xbzrle: on |
| Migration status: active |
| transferred ram: A kbytes |
| remaining ram: B kbytes |
| total ram: C kbytes |
| total time: D milliseconds |
| duplicate: E pages |
| normal: F pages |
| normal bytes: G kbytes |
| cache size: H bytes |
| xbzrle transferred: I kbytes |
| xbzrle pages: J pages |
| xbzrle cache miss: K |
| xbzrle overflow : L |
| |
| xbzrle cache-miss: the number of cache misses to date - high cache-miss rate |
| indicates that the cache size is set too low. |
| xbzrle overflow: the number of overflows in the decoding which where the delta |
| could not be compressed. This can happen if the changes in the pages are too |
| large or there are many short changes; for example, changing every second byte |
| (half a page). |
| |
| Testing: Testing indicated that live migration with XBZRLE was completed in 110 |
| seconds, whereas without it would not be able to complete. |
| |
| A simple synthetic memory r/w load generator: |
| .. include <stdlib.h> |
| .. include <stdio.h> |
| .. int main() |
| .. { |
| .. char *buf = (char *) calloc(4096, 4096); |
| .. while (1) { |
| .. int i; |
| .. for (i = 0; i < 4096 * 4; i++) { |
| .. buf[i * 4096 / 4]++; |
| .. } |
| .. printf("."); |
| .. } |
| .. } |