[FFmpeg-devel] [PATCH] codec/aarch64/hevc:add idct_32x32_neon
徐福隆
839789740 at qq.com
Thu Apr 13 06:48:15 EEST 2023
Thank you Martin, thank for pointing out the shortcomings.
// frank xu
------------------ Original ------------------
From: "FFmpeg development discussions and patches" <martin at martin.st>;
Date: Wed, Apr 12, 2023 09:02 PM
To: "FFmpeg development discussions and patches"<ffmpeg-devel at ffmpeg.org>;
Cc: "徐福隆"<839789740 at qq.com>;
Subject: Re: [FFmpeg-devel] [PATCH] codec/aarch64/hevc:add idct_32x32_neon
On Tue, 11 Apr 2023, xufuji456 wrote:
> got 73% speed up (run_count=1000, CPU=Cortex A53)
> idct_32x32_neon: 4826 idct_32x32_c: 18236
> idct_32x32_neon: 4824 idct_32x32_c: 18149
> idct_32x32_neon: 4937 idct_32x32_c: 18333
> ---
> libavcodec/aarch64/hevcdsp_idct_neon.S | 289 +++++++++++++++++++---
> libavcodec/aarch64/hevcdsp_init_aarch64.c | 5 +
> 2 files changed, 266 insertions(+), 28 deletions(-)
One minor comment below, otherwise it seems fine.
> +.macro tr_32x4 name, shift
> +function func_tr_32x4_\name
> + mov x10, lr
> + bl func_tr_16x4_noscale
Older binutils don't support the name 'lr' for the register, it has to be
spelled out as x30.
Pushed with that fixed.
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel at ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
More information about the ffmpeg-devel
mailing list