[FFmpeg-devel] [PATCH] avcodec/aarch64/aacencdsp: NEON implementation
Krzysztof Pyrkosz
ffmpeg at szaka.eu
Mon Jan 27 20:21:37 EET 2025
On Sun, Jan 26, 2025 at 01:29:38AM +0200, Martin Storsjö wrote:
> With the following diff:
>
> @@ -40,8 +41,8 @@ function ff_aac_quant_bands_neon, export=1
> movi v5.4s, 0x80, lsl #24
> .irp signed,1,0
> \signed:
> - subs w3, w3, #4
> ld1 {v3.4s}, [x2], #16
> + subs w3, w3, #4
> fmul v3.4s, v3.4s, v0.s[0]
> .if \signed
> ld1 {v4.4s}, [x1], #16
>
> I'm getting the following improvement:
>
> Before: Cortex A53 A72 A78
> quant_bands_signed_neon: 5661.0 2383.2 1113.2
> quant_bands_unsigned_neon: 5401.5 2067.8 811.8
> After:
> quant_bands_signed_neon: 5402.5 2385.5 1090.0
> quant_bands_unsigned_neon: 5145.5 2067.8 809.5
>
> No change on the A72 here, but apparently a (very) small improvement on the
> A78, and a bigger improvement on the A53 as expected.
>
> If you don't mind these changes, we could land the change with that tweaked.
> (I guess the numbers in the commit message could be re-measured, but I'm not
> sure if they change enough to make much of a difference there, especially on
> the cores you've measured on.)
>
> // Martin
I don't mind these changes, I'm perfectly fine with applying any
improvements on top of the patch.
The speeds on A78 and x13s did not change significantly, the initial
benchmark values can be used.
Krzysztof
More information about the ffmpeg-devel
mailing list