[FFmpeg-devel] [PATCH] codec/aarch64/hevc:add idct_32x32_neon

Martin Storsjö martin at martin.st
Wed Apr 12 16:02:07 EEST 2023


On Tue, 11 Apr 2023, xufuji456 wrote:

> got 73% speed up (run_count=1000, CPU=Cortex A53)
> idct_32x32_neon: 4826 idct_32x32_c: 18236
> idct_32x32_neon: 4824 idct_32x32_c: 18149
> idct_32x32_neon: 4937 idct_32x32_c: 18333
> ---
> libavcodec/aarch64/hevcdsp_idct_neon.S    | 289 +++++++++++++++++++---
> libavcodec/aarch64/hevcdsp_init_aarch64.c |   5 +
> 2 files changed, 266 insertions(+), 28 deletions(-)

One minor comment below, otherwise it seems fine.

> +.macro tr_32x4 name, shift
> +function func_tr_32x4_\name
> +        mov             x10, lr
> +        bl              func_tr_16x4_noscale

Older binutils don't support the name 'lr' for the register, it has to be 
spelled out as x30.

Pushed with that fixed.

// Martin



More information about the ffmpeg-devel mailing list