This module provides helper functionality to build code with SIMD instructions. Available since 0.42.0.
Note: this module is unstable. It is only provided as a technology preview. Its API may change in arbitrary ways between releases or it might be removed from Meson altogether.
This module is designed for the use case where you have an algorithm with one or more SIMD implementation and you choose which one to use at runtime.
The module provides one method, check
, which is used like this:
rval = simd.check('mysimds', mmx : 'simd_mmx.c', sse : 'simd_sse.c', sse2 : 'simd_sse2.c', sse3 : 'simd_sse3.c', ssse3 : 'simd_ssse3.c', sse41 : 'simd_sse41.c', sse42 : 'simd_sse42.c', avx : 'simd_avx.c', avx2 : 'simd_avx2.c', neon : 'simd_neon.c', compiler : cc)
Here the individual files contain the accelerated versions of the functions in question. The compiler
keyword argument takes the compiler you are going to use to compile them. The function returns an array with two values. The first value is a bunch of libraries that contain the compiled code. Any SIMD code that the compiler can't compile (for example, Neon instructions on an x86 machine) are ignored. You should pass this value to the desired target using link_with
. The second value is a configuration_data
object that contains true for all the values that were supported. For example if the compiler did support sse2 instructions, then the object would have HAVE_SSE2
set to 1.
Generating code to detect the proper instruction set at runtime is straightforward. First you create a header with the configuration object and then a chooser function that looks like this:
void (*fptr)(type_of_function_here) = NULL; #if HAVE_NEON if(fptr == NULL && neon_available()) { fptr = neon_accelerated_function; } #endif #if HAVE_AVX2 if(fptr == NULL && avx2_available()) { fptr = avx_accelerated_function; } #endif ... if(fptr == NULL) { fptr = default_function; }
Each source file provides two functions, the xxx_available
function to query whether the CPU currently in use supports the instruction set and xxx_accelerated_function
that is the corresponding accelerated implementation.
At the end of this function the function pointer points to the fastest available implementation and can be invoked to do the computation.