[FFmpeg-devel] [PATCH] lavc/aacenc_utils: replace powf(x, y) by expf(logf(x), y)

Wed Mar 9 03:20:01 CET 2016

On Tue, Mar 8, 2016 at 8:02 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> On Mon, Mar 7, 2016 at 10:48 PM, Ganesh Ajjanagadde <gajjanag at gmail.com>
> wrote:
>>
>> This is ~2x faster for y not an integer on Haswell+GCC, and should
>> generally be faster due to the fact that anyway powf essentially does
>> this under the hood.
>>
>> Note that there are some accuracy differences, that should generally be
>> negligible. In particular, FATE still passes on this platform.
>>
>> Results in ~ 7% speedup in aac encoding with -march=native, Haswell+GCC.
>> before:
>> ffmpeg -i sin.flac -acodec aac -y sin_new.aac  6.05s user 0.06s system
>> 104% cpu 5.821 total
>>
>> after:
>> ffmpeg -i sin.flac -acodec aac -y sin_new.aac  5.67s user 0.03s system
>> 105% cpu 5.416 total
>>
>> This is also faster than an alternative approach that pulls in powf, gets
>> rid of
>> the crufty NaN checks and other special cases, exploits knowledge about
>> the intervals, etc.
>> This of course does not exclude smarter approaches; just suggests that
>> there would need to be significant work on this front of lower utility
>> than
>> searches for hotspots elsewhere.
>>
>> Signed-off-by: Ganesh Ajjanagadde <gajjanag at gmail.com>
>> ---
>>  libavcodec/aacenc_utils.h | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/libavcodec/aacenc_utils.h b/libavcodec/aacenc_utils.h
>> index 56e3462..b7f80c6 100644
>> --- a/libavcodec/aacenc_utils.h
>> +++ b/libavcodec/aacenc_utils.h
>> @@ -121,7 +121,10 @@ static inline float find_form_factor(int group_len,
>> int swb_size, float thresh,
>>              if (s >= ethresh) {
>>                  nzl += 1.0f;
>>              } else {
>> -                nzl += powf(s / ethresh, nzslope);
>> +                if (nzslope == 2.f)
>> +                    nzl += (s / ethresh) * (s / ethresh);
>> +                else
>> +                    nzl += expf(logf(s / ethresh) * nzslope);
>>              }
>>          }
>
>
> There's two changes here. Which gives the speedup? I don't like the second
> (pow -> exp(log())) if it doesn't give a speedup (I don't see why it would,
> also).

The empirical fact of 2x speedup for non-integer is already mentioned.
The rationale was also briefly explained in the message.

More verbosely, there is no "fundamental" reason why it should, but
empirically it is reasonable since pow needs to handle a ton of edge
cases, and needs correction terms around what it is "morally" doing,
i.e exp(log(x)). Just look at
https://github.com/JuliaLang/openlibm/blob/master/src/e_powf.c
(implementation used in GNU/BSD/Apple libm).

>
> Ronald