[FFmpeg-devel] [PATCH] avcodec/aarch64/aacencdsp: NEON implementation

Krzysztof Pyrkosz ffmpeg at szaka.eu
Mon Jan 27 20:21:37 EET 2025


On Sun, Jan 26, 2025 at 01:29:38AM +0200, Martin Storsjö wrote:
> With the following diff:
> 
> @@ -40,8 +41,8 @@ function ff_aac_quant_bands_neon, export=1
>          movi            v5.4s, 0x80, lsl #24
>  .irp signed,1,0
>  \signed:
> -        subs            w3, w3, #4
>          ld1             {v3.4s}, [x2], #16
> +        subs            w3, w3, #4
>          fmul            v3.4s, v3.4s, v0.s[0]
>  .if \signed
>          ld1             {v4.4s}, [x1], #16
> 
> I'm getting the following improvement:
> 
> Before:                  Cortex A53      A72      A78
> quant_bands_signed_neon:     5661.0   2383.2   1113.2
> quant_bands_unsigned_neon:   5401.5   2067.8    811.8
> After:
> quant_bands_signed_neon:     5402.5   2385.5   1090.0
> quant_bands_unsigned_neon:   5145.5   2067.8    809.5
> 
> No change on the A72 here, but apparently a (very) small improvement on the
> A78, and a bigger improvement on the A53 as expected.
> 
> If you don't mind these changes, we could land the change with that tweaked.
> (I guess the numbers in the commit message could be re-measured, but I'm not
> sure if they change enough to make much of a difference there, especially on
> the cores you've measured on.)
> 
> // Martin

I don't mind these changes, I'm perfectly fine with applying any
improvements on top of the patch.
The speeds on A78 and x13s did not change significantly, the initial
benchmark values can be used.

Krzysztof


More information about the ffmpeg-devel mailing list