[FFmpeg-devel] [PATCH] avcodec/aarch64/aacencdsp: NEON implementation

Martin Storsjö martin at martin.st
Tue Jan 28 10:46:37 EET 2025


On Mon, 27 Jan 2025, Krzysztof Pyrkosz via ffmpeg-devel wrote:

> On Sun, Jan 26, 2025 at 01:29:38AM +0200, Martin Storsjö wrote:
>> With the following diff:
>>
>> @@ -40,8 +41,8 @@ function ff_aac_quant_bands_neon, export=1
>>          movi            v5.4s, 0x80, lsl #24
>>  .irp signed,1,0
>>  \signed:
>> -        subs            w3, w3, #4
>>          ld1             {v3.4s}, [x2], #16
>> +        subs            w3, w3, #4
>>          fmul            v3.4s, v3.4s, v0.s[0]
>>  .if \signed
>>          ld1             {v4.4s}, [x1], #16
>>
>> I'm getting the following improvement:
>>
>> Before:                  Cortex A53      A72      A78
>> quant_bands_signed_neon:     5661.0   2383.2   1113.2
>> quant_bands_unsigned_neon:   5401.5   2067.8    811.8
>> After:
>> quant_bands_signed_neon:     5402.5   2385.5   1090.0
>> quant_bands_unsigned_neon:   5145.5   2067.8    809.5
>>
>> No change on the A72 here, but apparently a (very) small improvement on the
>> A78, and a bigger improvement on the A53 as expected.
>>
>> If you don't mind these changes, we could land the change with that tweaked.
>> (I guess the numbers in the commit message could be re-measured, but I'm not
>> sure if they change enough to make much of a difference there, especially on
>> the cores you've measured on.)
>>
>> // Martin
>
> I don't mind these changes, I'm perfectly fine with applying any
> improvements on top of the patch.
> The speeds on A78 and x13s did not change significantly, the initial
> benchmark values can be used.

Ok, great, I've pushed this patch then. Thanks for your contribution!

// Martin


More information about the ffmpeg-devel mailing list