[FFmpeg-devel] [PATCH] codec/aarch64/hevc:add idct_32x32_neon
Martin Storsjö
martin at martin.st
Wed Apr 12 16:02:07 EEST 2023
On Tue, 11 Apr 2023, xufuji456 wrote:
> got 73% speed up (run_count=1000, CPU=Cortex A53)
> idct_32x32_neon: 4826 idct_32x32_c: 18236
> idct_32x32_neon: 4824 idct_32x32_c: 18149
> idct_32x32_neon: 4937 idct_32x32_c: 18333
> ---
> libavcodec/aarch64/hevcdsp_idct_neon.S | 289 +++++++++++++++++++---
> libavcodec/aarch64/hevcdsp_init_aarch64.c | 5 +
> 2 files changed, 266 insertions(+), 28 deletions(-)
One minor comment below, otherwise it seems fine.
> +.macro tr_32x4 name, shift
> +function func_tr_32x4_\name
> + mov x10, lr
> + bl func_tr_16x4_noscale
Older binutils don't support the name 'lr' for the register, it has to be
spelled out as x30.
Pushed with that fixed.
// Martin
More information about the ffmpeg-devel
mailing list