[FFmpeg-devel] [PATCH] avcodec/aarch64/aacencdsp: NEON implementation
Martin Storsjö
martin at martin.st
Tue Jan 28 10:46:37 EET 2025
On Mon, 27 Jan 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:
> On Sun, Jan 26, 2025 at 01:29:38AM +0200, Martin Storsjö wrote:
>> With the following diff:
>>
>> @@ -40,8 +41,8 @@ function ff_aac_quant_bands_neon, export=1
>> movi v5.4s, 0x80, lsl #24
>> .irp signed,1,0
>> \signed:
>> - subs w3, w3, #4
>> ld1 {v3.4s}, [x2], #16
>> + subs w3, w3, #4
>> fmul v3.4s, v3.4s, v0.s[0]
>> .if \signed
>> ld1 {v4.4s}, [x1], #16
>>
>> I'm getting the following improvement:
>>
>> Before: Cortex A53 A72 A78
>> quant_bands_signed_neon: 5661.0 2383.2 1113.2
>> quant_bands_unsigned_neon: 5401.5 2067.8 811.8
>> After:
>> quant_bands_signed_neon: 5402.5 2385.5 1090.0
>> quant_bands_unsigned_neon: 5145.5 2067.8 809.5
>>
>> No change on the A72 here, but apparently a (very) small improvement on the
>> A78, and a bigger improvement on the A53 as expected.
>>
>> If you don't mind these changes, we could land the change with that tweaked.
>> (I guess the numbers in the commit message could be re-measured, but I'm not
>> sure if they change enough to make much of a difference there, especially on
>> the cores you've measured on.)
>>
>> // Martin
>
> I don't mind these changes, I'm perfectly fine with applying any
> improvements on top of the patch.
> The speeds on A78 and x13s did not change significantly, the initial
> benchmark values can be used.
Ok, great, I've pushed this patch then. Thanks for your contribution!
// Martin
More information about the ffmpeg-devel
mailing list