[FFmpeg-devel] [PATCH] lavc/aacenc_utils: unroll abs_pow34_v loop
Ganesh Ajjanagadde
gajjanag at gmail.com
Tue Mar 22 19:14:49 CET 2016
On Sat, Mar 19, 2016 at 5:35 AM, Rostislav Pehlivanov
<atomnuker at gmail.com> wrote:
> On 19 March 2016 at 05:12, Ganesh Ajjanagadde <gajjanag at gmail.com> wrote:
>
>> It seems like in all usages, size is a multiple of 4. This is documented
>> as an assert.
>>
>> Yields speedup in this function, and small speedup for aac encoding
>> overall.
>>
>> Sample benchmark (Haswell, -march=native + GCC):
>> old:
>> [...]
>> 1390 decicycles in abs_pow34_v, 127138 runs, 3934 skips63.1x
>> 1385 decicycles in abs_pow34_v, 254191 runs, 7953 skips64.4x
>> 1383 decicycles in abs_pow34_v, 508305 runs, 15983 skips65.3x
>>
>> new:
>> [...]
>> 1109 decicycles in abs_pow34_v, 127122 runs, 3950 skips61.2x
>> 1107 decicycles in abs_pow34_v, 254177 runs, 7967 skips63.5x
>> 1106 decicycles in abs_pow34_v, 508292 runs, 15996 skips65.3x
>>
>> old:
>> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac 4.55s user 0.03s
>> system 99% cpu 4.581 total
>> new:
>> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac 4.50s user 0.04s
>> system 99% cpu 4.537 total
>>
>> Signed-off-by: Ganesh Ajjanagadde <gajjanag at gmail.com>
>> ---
>> libavcodec/aacenc_utils.h | 24 +++++++++++++++---------
>> 1 file changed, 15 insertions(+), 9 deletions(-)
>>
>>
> Are you sure that this speedup (and the other patch you posted) is real and
> above the error? Did you do multiple runs to rule out that it was chance?
> 0.04/0.05 second improvement on 5 seconds doesn't seem significant at all,
I am really sorry about these measurements, they were screwed up by a
very recent regression on my laptop due to some package upgrade.
Essentially, put it to suspend, restore, and the clock freq/cpu
governor would downshift slightly, from 2.4 to 2.2 GHz base, no idea
about the changes to the turbo freq.
So please ignore these.
However, here is a heuristic calculation of the impact:
between 500,000 and 1,000,000 runs, 30 cycle speedup per run ~ 15-30
million cycles saved overall out of ~ 5 * 3 billion = 15 billion
cycles. So it is near the 0.1% threshold, see below.
> and we have to put the line on placebo speedups or enjoy the whole project
> filling up with sphagetti code.
> Although the decrease in decicycles for the function was nice, what matters
> at the end is whether the speedup is enough to justify the extra code,
Per doc/optimization.txt, aac is a widely used codec, so even a 0.1%
improvement in aac is fair game for optimizations, assuming it is a
small code change. Of course, one can debate whether this is small or
not. I view it as simple and clean, others may disagree.
> and
> I have a suspicion that the compiler inlines and unrolls that function
> anyway. Try putting __attribute__ ((noinline)) as an attribute to see if
> that makes a difference. I'll have time to test later today.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
More information about the ffmpeg-devel
mailing list