[FFmpeg-devel] [PATCH 1/6] opus: convert encoder and decoder to lavu/tx

Mon Sep 26 00:08:26 EEST 2022

Sep 25, 2022, 14:34 by andreas.rheinhardt at outlook.com:

> Lynne:
>
>> Sep 24, 2022, 23:57 by dev at lynne.ee:
>>
>>> Sep 24, 2022, 21:40 by martin at martin.st:
>>>
>>>> What about ac3dsp then - that one seems like it's fairly optimized for arm?
>>>>
>>> Haven't touched them, they're still being used. Unfortunately, for AC3,
>>> the full MDCT optimizations in lavc do make a difference and the overall
>>> decoder becomes 15% slower with this patch on for aarch64 with lavu/tx's
>>> asm disabled and 7% slower with lavu/tx's asm enabled. I do plan to write
>>> an aarch64 MDCT NEON SIMD code in a month or so, unless someone is faster,
>>> which should make the decoder at least 10% faster with lavu/tx.
>>>
>>
>> I'd just like to add this was for the float version of the ac3 decoder. The fixed-point
>> version is a few percent faster with the patch on an A53, and quite a bit
>> more accurate.
>> The lavc fixed-point FFT code also has some weird large spikes in #cycles
>> for some transform sizes, so the figure above is an average, but the dips
>> went from 117x realtime to 78x realtime, which on a slower CPU may
>> be the difference between stuttering and realtime playback.
>> On this CPU, the fixed-point version is 23% slower than the float version,
>> but on a CPU with slower float ops, it would make more sense to pick that
>> decoder up than the float version.
>> The 2 decoders produce nearly identical results, minus a few rounding
>> errors, since AC3 is inherently a fixed-point codec. The only difference
>> are the transforms themselves, and the extra ops needed to convert
>> the 25bit ints to floats in the float decoder.
>>
>
> 1. You forgot to remove mdct15 requirements from configure in this whole
> patchset.
> 2. You forgot to update the FATE references for several tests; e.g. when
> only applying the ac3 patch, then I get this:
>

I know. durandal pointed it out the day I sent them. I'll send them again
later.
I'm planning to just push the Opus patch in a day with the mdct15
line in configure gone.

> As the above shows, the difference between the reference files and the
> decoded output becomes larger in several tests, i.e. the reference files
> won't be usable lateron. If the new float and fixed-point decoders
> produce indeed produce nearly identical output, then one could write
> tests that decode the same file with both the floating point and the
> fixed point decoder, check that both are nearly identical and print a
> checksum of the output of the fixed point decoder.
>

I have a standalone program I've hacked on as I need to for the fixed-point
transforms: https://0x0.st/oWxO.c
The square root of the squared rounding error across the entire range
(1 to 21 bits) of transforms from 32pt to 1024pt is 6.855655 for lavu and
7.141428 for lavc, which is slightly worse. If you extend the range
to 22bits, the 1024pt transform in lavc explodes, while lavu is still fine,
thus showing a greater range.
The rounding errors are a lesser problem than hitting the max range,
because then you get huge spikes in the output.
I can further reduce the error in lavu at the cost of speed, but I think
this is sufficient.

> Also note that there is currently no test that directly verifies your
> claims of greater accuracy. One could write such a test by encoding a
> file with ac3-fixed and decoding it again (with the fixed point decoder)
> and printing the psnr of input and output. No encoding tests does this
> at the moment.
>

I'm not writing that, but I like the idea, the point of fixed-point decoders
isn't bitexactness, but speed on slow hardware, so we shouldn't be testing
an MD5.