[FFmpeg-devel] [PATCH 1/6] opus: convert encoder and decoder to lavu/tx

Sun Sep 25 10:54:36 EEST 2022

Sep 24, 2022, 23:57 by dev at lynne.ee:

> Sep 24, 2022, 21:40 by martin at martin.st:
>
>> What about ac3dsp then - that one seems like it's fairly optimized for arm?
>>
> Haven't touched them, they're still being used. Unfortunately, for AC3,
> the full MDCT optimizations in lavc do make a difference and the overall
> decoder becomes 15% slower with this patch on for aarch64 with lavu/tx's
> asm disabled and 7% slower with lavu/tx's asm enabled. I do plan to write
> an aarch64 MDCT NEON SIMD code in a month or so, unless someone is faster,
> which should make the decoder at least 10% faster with lavu/tx.
>

I'd just like to add this was for the float version of the ac3 decoder. The fixed-point
version is a few percent faster with the patch on an A53, and quite a bit
more accurate.
The lavc fixed-point FFT code also has some weird large spikes in #cycles
for some transform sizes, so the figure above is an average, but the dips
went from 117x realtime to 78x realtime, which on a slower CPU may
be the difference between stuttering and realtime playback.
On this CPU, the fixed-point version is 23% slower than the float version,
but on a CPU with slower float ops, it would make more sense to pick that
decoder up than the float version.
The 2 decoders produce nearly identical results, minus a few rounding
errors, since AC3 is inherently a fixed-point codec. The only difference
are the transforms themselves, and the extra ops needed to convert
the 25bit ints to floats in the float decoder.