[FFmpeg-devel] [PATCH 1/5] configure: Add option for enabling LC3/LC3plus wrapper

Tue Mar 26 19:57:23 EET 2024

Compared with the C implementation of KissFFT (it's the only one I tested
on ARM M4).
Yes, there is no SIMD on x86. This was not the main target.
Was mainly made for ARM M4 (for BLE devices Nordic Semi / Zephyr), and ARM
Neon (Android).
By the way, this does not change a lot, the FFT/MDCT on powerful CPU's is
marginal compared to the read/write of the bitstream arithmetically coded.
We can perhaps connect the FFMpeg implementation, but it will probably miss
2 things:
- Some transformations are not a multiple of 15, but only 5 * 2^n. I guess
FFmpeg only has a base 15 implementation.
- It uses asymmetric windowing, to reduce algorithmic delay. Some
coefficients are zeroed. Not important, but will need a larger coefficients
table, and a bunch of multiplication by 0, without a specific
implementation.
So I think it will need some work.

On Tue, Mar 26, 2024 at 10:45 AM Paul B Mahol <onemda at gmail.com> wrote:

>
>
> On Tue, Mar 26, 2024 at 6:07 PM Antoine Soulier <asoulier at google.com>
> wrote:
>
>> What do you mean by sub-optimal?
>> It's stacked by prime factors, and unrolled for FFT3 and FF5.
>> The butterfly implementations of FFT3 and FF5, gives me slightly slower
>> computation. FFT5 is done first, so it takes advantage of sin()/cos()
>> values of 0 or 1.
>> There are also no reordering steps (this stage is completely removed),
>> but cannot run in-place.
>> Benchmarks I made show that it runs slightly faster.
>>
>
> Compared with what?
> Where is at least x86 SIMD for that MDCT?
>
>
>>
>> On Tue, Mar 26, 2024 at 9:59 AM Paul B Mahol <onemda at gmail.com> wrote:
>>
>>>
>>> Isn't this using sub-optimal MDCT implementation?
>>>
>>