[FFmpeg-devel] [PATCH 5/5] lavc/aarch64: Provide neon implementation of nsse16

Martin Storsjö martin at martin.st
Sat Sep 3 00:29:46 EEST 2022


On Mon, 22 Aug 2022, Hubert Mazur wrote:

> Add vectorized implementation of nsse16 function.
>
> Performance comparison tests are shown below.
> - nsse_0_c: 707.0
> - nsse_0_neon: 120.0
>
> Benchmarks and tests run with checkasm tool on AWS Graviton 3.
>
> Signed-off-by: Hubert Mazur <hum at semihalf.com>
> ---
> libavcodec/aarch64/me_cmp_init_aarch64.c |  15 +++
> libavcodec/aarch64/me_cmp_neon.S         | 126 +++++++++++++++++++++++
> 2 files changed, 141 insertions(+)
>
> diff --git a/libavcodec/aarch64/me_cmp_neon.S b/libavcodec/aarch64/me_cmp_neon.S
> index 46d4dade5d..9fe96e111c 100644
> --- a/libavcodec/aarch64/me_cmp_neon.S
> +++ b/libavcodec/aarch64/me_cmp_neon.S
> @@ -889,3 +889,129 @@ function vsse_intra16_neon, export=1
>
>         ret
> endfunc
> +
> +function nsse16_neon, export=1
> +        // x0           multiplier
> +        // x1           uint8_t *pix1
> +        // x2           uint8_t *pix2
> +        // x3           ptrdiff_t stride
> +        // w4           int h
> +
> +        str             x0, [sp, #-0x40]!
> +        stp             x1, x2, [sp, #0x10]
> +        stp             x3, x4, [sp, #0x20]
> +        str             lr, [sp, #0x30]
> +        bl              sse16_neon
> +        ldr             lr, [sp, #0x30]

This breaks building in two configurations; old binutils doesn't recognize 
the register name lr, you need to spell out x30.

Building on macOS breaks since there's no symbol named sse16_neon; this is 
an exported function, so it has got the symbol prefix _. So you need to do 
"bl X(sse16_neon)" here.

Didn't look at the code from a performance perspective yet.

// Martin



More information about the ffmpeg-devel mailing list