[FFmpeg-devel] [PATCH 2/5] lavc/aarch64: Add neon implementation for sse4
Martin Storsjö
martin at martin.st
Thu Aug 18 12:10:57 EEST 2022
On Tue, 16 Aug 2022, Hubert Mazur wrote:
> Provide neon implementation for sse4 function.
>
> Performance comparison tests are shown below.
> - sse_2_c: 80.7
> - sse_2_neon: 31.0
>
> Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
>
> Signed-off-by: Hubert Mazur <hum at semihalf.com>
> ---
> libavcodec/aarch64/me_cmp_init_aarch64.c | 3 ++
> libavcodec/aarch64/me_cmp_neon.S | 58 ++++++++++++++++++++++++
> 2 files changed, 61 insertions(+)
This patch had the same issue about unused d18 register and unnecessary
add instruction, and the misaligned function declaration.
// Martin
More information about the ffmpeg-devel
mailing list