[FFmpeg-devel] [PATCH] codec/aarch64/hevc:add idct_32x32_neon

徐福隆 839789740 at qq.com
Thu Apr 13 06:48:15 EEST 2023


Thank you Martin, thank for pointing out the shortcomings.


// frank xu


------------------ Original ------------------
From:                                                                                                                        "FFmpeg development discussions and patches"                                                                                    <martin at martin.st>;
Date: Wed, Apr 12, 2023 09:02 PM
To: "FFmpeg development discussions and patches"<ffmpeg-devel at ffmpeg.org>;
Cc: "徐福隆"<839789740 at qq.com>;
Subject: Re: [FFmpeg-devel] [PATCH] codec/aarch64/hevc:add idct_32x32_neon



On Tue, 11 Apr 2023, xufuji456 wrote:

> got 73% speed up (run_count=1000, CPU=Cortex A53)
> idct_32x32_neon: 4826 idct_32x32_c: 18236
> idct_32x32_neon: 4824 idct_32x32_c: 18149
> idct_32x32_neon: 4937 idct_32x32_c: 18333
> ---
> libavcodec/aarch64/hevcdsp_idct_neon.S    | 289 +++++++++++++++++++---
> libavcodec/aarch64/hevcdsp_init_aarch64.c |   5 +
> 2 files changed, 266 insertions(+), 28 deletions(-)

One minor comment below, otherwise it seems fine.

> +.macro tr_32x4 name, shift
> +function func_tr_32x4_\name
> +        mov             x10, lr
> +        bl              func_tr_16x4_noscale

Older binutils don't support the name 'lr' for the register, it has to be 
spelled out as x30.

Pushed with that fixed.

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel at ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".


More information about the ffmpeg-devel mailing list