[FFmpeg-devel] [PATCH 3/5] lavc/aarch64: Add neon implementation for vsad_intra16

Martin Storsjö martin at martin.st
Sun Sep 4 23:58:18 EEST 2022


On Mon, 22 Aug 2022, Hubert Mazur wrote:

> Provide optimized implementation for vsad_intra16 function for arm64.
>
> Performance comparison tests are shown below.
> - vsad_4_c: 177.2
> - vsad_4_neon: 24.5
>
> Benchmarks and tests are run with checkasm tool on AWS Gravtion 3.
>
> Signed-off-by: Hubert Mazur <hum at semihalf.com>
> ---
> libavcodec/aarch64/me_cmp_init_aarch64.c |  3 ++
> libavcodec/aarch64/me_cmp_neon.S         | 58 ++++++++++++++++++++++++
> 2 files changed, 61 insertions(+)

Same thing as for the others; keep the data for the previous row in 
registers instead of loading it twice.

// Martin



More information about the ffmpeg-devel mailing list