[FFmpeg-devel] [PATCH] aacenc_utils: unroll loops to allow compiler to use SIMD.
Ganesh Ajjanagadde
gajjanag at gmail.com
Tue Mar 8 04:50:53 CET 2016
On Mon, Mar 7, 2016 at 2:54 AM, Reimar Döffinger
<Reimar.Doeffinger at gmx.de> wrote:
> On 07.03.2016, at 04:04, Ganesh Ajjanagadde <gajjanag at gmail.com> wrote:
>> On Sun, Mar 6, 2016 at 1:43 PM, Reimar Döffinger
>> <Reimar.Doeffinger at gmx.de> wrote:
>>> On Sun, Mar 06, 2016 at 07:35:58PM +0100, Reimar Döffinger wrote:
>>>> Approximately 10% faster transcode from mp3 to aac
>>>> with default settings.
>>>
>>> Note to anyone wanting to optimize it further:
>>> There is almost 25% on the table if you can replace
>>> the pow() and cos() function uses by something more
>>> efficient.
>>
>> So I did try one thing, namely in lavc/aacenc_utils, replace powf in
>> find_form_factor by a conditional checking for 2.0f, squaring if it
>> is, powf otherwise (see lavc/aaccoder_twoloop for the calls, one is
>> with 2.0f, other without), but it yields essentially nothing.
>>
>> Likewise, an even more trivial one is line 125 of aaccoder_twoloop:
>> powf can be replaced here by sqrtf(sqrtf()), but this also yields
>> nothing.
>
> Probably those cases are already optimized by the implementation.
The first one is indeed optimized, find_form_factor is inlined so it
can use integer exponent optimizations (something gcc does). However,
I have a patch that gives ~ 7% boost by replacing powf by
expf(logf()). There are slight differences in floating point value,
but FATE still passes.
The second one is not possible for an environment to optimize, but is
of purely academic interest anyway since it is called only once.
Concretely, this saves ~ 300 cycles out of an ~ 700,000 cycle function
search_for_quantizers_twoloop.
>
>> Can you be more specific, and are you sure about this?
>
> Just run your favourite performance analysis tool and you'll see.
> As it is non-inlined libc code I'm fairly sure the numbers are accurate enough.
I am still puzzled by the remarks; and hence asked for specific
examples. In aac code, cosf is only called for table generation, same
with cos, so still don't see why cos is relevant.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
More information about the ffmpeg-devel
mailing list