cutils: add avx2 instruction optimization

buffer_find_nonzero_offset() is a hot function during live migration.
Now it use SSE2 instructions for optimization. For platform supports
AVX2 instructions, use AVX2 instructions for optimization can help
to improve the performance of buffer_find_nonzero_offset() about 30%
comparing to SSE2.

Live migration can be faster with this optimization, the test result
shows that for an 8GiB RAM idle guest just boots, this patch can help
to shorten the total live migration time about 6%.

This patch use the ifunc mechanism to select the proper function when
running, for platform supports AVX2, execute the AVX2 instructions,
else, execute the original instructions.

Signed-off-by: Liang Li <liang.z.li@intel.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Suggested-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1457416397-26671-3-git-send-email-liang.z.li@intel.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
2 files changed
tree: 20493c1d2f176a292481e5ba900ca1c986b77299
  1. audio/
  2. backends/
  3. block/
  4. bsd-user/
  5. contrib/
  6. crypto/
  7. default-configs/
  8. disas/
  9. docs/
  10. fpu/
  11. fsdev/
  12. gdb-xml/
  13. hw/
  14. include/
  15. io/
  16. libdecnumber/
  17. linux-headers/
  18. linux-user/
  19. migration/
  20. nbd/
  21. net/
  22. pc-bios/
  23. po/
  24. qapi/
  25. qga/
  26. qobject/
  27. qom/
  28. replay/
  29. roms/
  30. scripts/
  31. slirp/
  32. stubs/
  33. target-alpha/
  34. target-arm/
  35. target-cris/
  36. target-i386/
  37. target-lm32/
  38. target-m68k/
  39. target-microblaze/
  40. target-mips/
  41. target-moxie/
  42. target-openrisc/
  43. target-ppc/
  44. target-s390x/
  45. target-sh4/
  46. target-sparc/
  47. target-tilegx/
  48. target-tricore/
  49. target-unicore32/
  50. target-xtensa/
  51. tcg/
  52. tests/
  53. trace/
  54. ui/
  55. util/
  56. .dir-locals.el
  57. .exrc
  58. .gitignore
  59. .gitmodules
  60. .mailmap
  61. .travis.yml
  62. accel.c
  63. aio-posix.c
  64. aio-win32.c
  65. arch_init.c
  66. async.c
  67. balloon.c
  68. block.c
  69. blockdev-nbd.c
  70. blockdev.c
  71. blockjob.c
  72. bootdevice.c
  73. bt-host.c
  74. bt-vhci.c
  75. Changelog
  76. CODING_STYLE
  77. configure
  78. COPYING
  79. COPYING.LIB
  80. cpu-exec-common.c
  81. cpu-exec.c
  82. cpus.c
  83. cputlb.c
  84. device-hotplug.c
  85. device_tree.c
  86. disas.c
  87. dma-helpers.c
  88. dump.c
  89. exec.c
  90. gdbstub.c
  91. HACKING
  92. hmp-commands-info.hx
  93. hmp-commands.hx
  94. hmp.c
  95. hmp.h
  96. iohandler.c
  97. ioport.c
  98. iothread.c
  99. kvm-all.c
  100. kvm-stub.c
  101. LICENSE
  102. main-loop.c
  103. MAINTAINERS
  104. Makefile
  105. Makefile.objs
  106. Makefile.target
  107. memory.c
  108. memory_mapping.c
  109. module-common.c
  110. monitor.c
  111. numa.c
  112. os-posix.c
  113. os-win32.c
  114. page_cache.c
  115. qapi-schema.json
  116. qdev-monitor.c
  117. qdict-test-data.txt
  118. qemu-bridge-helper.c
  119. qemu-char.c
  120. qemu-doc.texi
  121. qemu-ga.texi
  122. qemu-img-cmds.hx
  123. qemu-img.c
  124. qemu-img.texi
  125. qemu-io-cmds.c
  126. qemu-io.c
  127. qemu-nbd.c
  128. qemu-nbd.texi
  129. qemu-options-wrapper.h
  130. qemu-options.h
  131. qemu-options.hx
  132. qemu-seccomp.c
  133. qemu-tech.texi
  134. qemu-timer.c
  135. qemu.nsi
  136. qemu.sasl
  137. qjson.c
  138. qmp-commands.hx
  139. qmp.c
  140. qtest.c
  141. README
  142. rules.mak
  143. softmmu_template.h
  144. spice-qemu-char.c
  145. tcg-runtime.c
  146. tci.c
  147. thread-pool.c
  148. thunk.c
  149. tpm.c
  150. trace-events
  151. translate-all.c
  152. translate-all.h
  153. translate-common.c
  154. user-exec.c
  155. VERSION
  156. version.rc
  157. vl.c
  158. xen-common-stub.c
  159. xen-common.c
  160. xen-hvm-stub.c
  161. xen-hvm.c
  162. xen-mapcache.c