f28e0bbefa41fe643cce2f107e868abff312ced9 - qemu

commit: f28e0bbefa41fe643cce2f107e868abff312ced9
[log]
author: Alexander Monakov <amonakov@ispras.ru>
Tue Feb 06 23:48:08 2024 +0300
committer: Richard Henderson <richard.henderson@linaro.org>
Fri May 03 08:03:05 2024 -0700
tree: 933db7fedccb1c2590441909271db03ff8cba52f
parent: 93a6085618f16fb2cd316d1e84f1a638b7e2d8ff [diff]

util/bufferiszero: Optimize SSE2 and AVX2 variants

Increase unroll factor in SIMD loops from 4x to 8x in order to move
their bottlenecks from ALU port contention to load issue rate (two loads
per cycle on popular x86 implementations).

Avoid using out-of-bounds pointers in loop boundary conditions.

Follow SSE2 implementation strategy in the AVX2 variant. Avoid use of
PTEST, which is not profitable there (like in the removed SSE4 variant).

Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Mikhail Romanov <mmromanov@ispras.ru>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240206204809.9859-6-amonakov@ispras.ru>

util/bufferiszero.c[diff]

1 file changed

tree: 933db7fedccb1c2590441909271db03ff8cba52f