qemu /
qemu /
d9fe9db943d4e855a75424978d7ab87fd54edd98 hardfloat: implement float32/64 comparison
Performance results for fp-bench:
Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
- before:
cmp-single: 110.98 MFlops
cmp-double: 107.12 MFlops
- after:
cmp-single: 506.28 MFlops
cmp-double: 524.77 MFlops
Note that flattening both eq and eq_signaling versions
would give us extra performance (695v506, 615v524 Mflops
for single/double, respectively) but this would emit two
essentially identical functions for each eq/signaling pair,
which is a waste.
Aggregate performance improvement for the last few patches:
[ all charts in png: https://imgur.com/a/4yV8p ]
1. Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
qemu-aarch64 NBench score; higher is better
Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
16 +-+-----------+-------------+----===-------+---===-------+-----------+-+
14 +-+..........................@@@&&.=.......@@@&&.=...................+-+
12 +-+..........................@.@.&.=.......@.@.&.=.....+befor=== +-+
10 +-+..........................@.@.&.=.......@.@.&.=.....+ad@@&& = +-+
8 +-+.......................$$$%.@.&.=.......@.@.&.=.....+ @@u& = +-+
6 +-+............@@@&&=+***##.$%.@.&.=***##$$%+@.&.=..###$$%%@i& = +-+
4 +-+.......###$%%.@.&=.*.*.#.$%.@.&.=*.*.#.$%.@.&.=+**.#+$ +@m& = +-+
2 +-+.....***.#$.%.@.&=.*.*.#.$%.@.&.=*.*.#.$%.@.&.=.**.#+$+sqr& = +-+
0 +-+-----***##$%%@@&&=-***##$$%@@&&==***##$$%@@&&==-**##$$%+cmp==-----+-+
FOURIER NEURAL NELU DECOMPOSITION gmean
qemu-aarch64 SPEC06fp (test set) speedup over QEMU 4c2c1015905
Host: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
error bars: 95% confidence interval
4.5 +-+---+-----+----+-----+-----+-&---+-----+----+-----+-----+-----+----+-----+-----+-----+-----+----+-----+---+-+
4 +-+..........................+@@+...........................................................................+-+
3.5 +-+..............%%@&.........@@..............%%@&............................................+++dsub +-+
2.5 +-+....&&+.......%%@&.......+%%@..+%%&+..@@&+.%%@&....................................+%%&+.+%@&++%%@& +-+
2 +-+..+%%&..+%@&+.%%@&...+++..%%@...%%&.+$$@&..%%@&..%%@&.......+%%&+.%%@&+......+%%@&.+%%&++$$@&++d%@& %%@&+-+
1.5 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&*+f%@&**$%@&+-+
0.5 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&+sqr@&**$%@&+-+
0 +-+**#$%&**#$@&**#%@&**$%@**#$%@**#$%&**#$@&**$%@&*#$%@**#$%@**#$%&**#%@&**$%@&*#$%@**#$%&**#$@&*+cmp&**$%@&+-+
410.bw416.gam433.434.z435.436.cac437.lesli444.447.de450.so453454.ca459.GemsF465.tont470.lb4482.sphinxgeomean
2. Host: ARM Aarch64 A57 @ 2.4GHz
qemu-aarch64 NBench score; higher is better
Host: Applied Micro X-Gene, Aarch64 A57 @ 2.4 GHz
5 +-+-----------+-------------+-------------+-------------+-----------+-+
4.5 +-+........................................@@@&==...................+-+
3 4 +-+..........................@@@&==........@.@&.=.....+before +-+
3 +-+..........................@.@&.=........@.@&.=.....+ad@@@&== +-+
2.5 +-+.....................##$$%%.@&.=........@.@&.=.....+ @m@& = +-+
2 +-+............@@@&==.***#.$.%.@&.=.***#$$%%.@&.=.***#$$%%d@& = +-+
1.5 +-+.....***#$$%%.@&.=.*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#+$ +f@& = +-+
0.5 +-+.....*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#.$.%.@&.=.*.*#+$+sqr& = +-+
0 +-+-----***#$$%%@@&==-***#$$%%@@&==-***#$$%%@@&==-***#$$%+cmp==-----+-+
FOURIER NEURAL NLU DECOMPOSITION gmean
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
1 file changed