[FFmpeg-devel] [PATCH] lavc/aarch64: Add neon implementation for pix_abs16_y2

Thu Aug 4 11:12:29 EEST 2022

On Mon, 25 Jul 2022, Hubert Mazur wrote:

> Provide optimized implementation of pix_abs16_y2 function for arm64.
>
> Performance comparison tests are shown below.
> pix_abs_0_2_c: 308.5
> pix_abs_0_2_neon: 39.2
>
> Benchmarks and tests run with checkasm tool on AWS Graviton 3.
>
> Signed-off-by: Hubert Mazur <hum at semihalf.com>
> ---
> libavcodec/aarch64/me_cmp_init_aarch64.c |  3 +
> libavcodec/aarch64/me_cmp_neon.S         | 73 ++++++++++++++++++++++++
> 2 files changed, 76 insertions(+)

> +// iterate by one
> +2:
> +
> +        ld1             {v1.16b}, [x2], x3              // Load pix2
> +        ld1             {v2.16b}, [x5], x3              // Load pix3
> +        urhadd          v30.16b, v1.16b, v2.16b         // Rounding halving add
> +        ld1             {v0.16b}, [x1], x3              // Load pix1
> +        uabd            v30.16b, v30.16b, v30.16b

This should be "uabd v30, v30, v0" here too - please check the uncommon 
codepaths too (until we can make checkasm test them by default).

// Martin