[FFmpeg-devel] [PATCH] Extra build options for ALS (and others)
Thilo Borgmann
thilo.borgmann
Mon Nov 30 20:29:42 CET 2009
Michael Niedermayer schrieb:
> On Mon, Nov 30, 2009 at 04:09:23PM +0100, Thilo Borgmann wrote:
>> Thilo Borgmann schrieb:
>>> M?ns Rullg?rd schrieb:
>>>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
>>>>
>>>>> M?ns Rullg?rd schrieb:
>>>>>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
>>>>>>
>>>>>>> M?ns Rullg?rd schrieb:
>>>>>>>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
>>>>>>>>
>>>>>>>>> M?ns Rullg?rd schrieb:
>>>>>>>>>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> recently the need for an extra build option for the ALS decoder arose.
>>>>>>>>>> Is it impossible to achieve the desired outcome with some combination
>>>>>>>>>> of always_inline, noinline, and flatten attributes?
>>>>>>>>> No. See [PATCH] Split reading and decoding of blocks in ALS.
>>>>>>>>>
>>>>>>>>> Although I've managed to have the functions from the alsdec.c inlined
>>>>>>>>> manually according to the grep'ed output of the assembler code, it seems
>>>>>>>>> like it is not enough to manually inline functions from within that .c
>>>>>>>>> file only using these technique.
>>>>>>>> I'm confused. Can it be done in the C code only or not? This kind of
>>>>>>>> issue should really not be solved in the makefile.
>>>>>>> The issue is the big slowdown. The patch that causes this splits a big
>>>>>>> function into two, which are then called successively.
>>>>>>>
>>>>>>> To overcome the slowdown issue, I inspected the functions being inlined
>>>>>>> with and without the -finline-limit option. I can use av_always_inline
>>>>>>> for many functions within alsdec.c to have the same functions inlined
>>>>>>> like -finline-limit does.
>>>>>>>
>>>>>>> Unfortunately, using -finline-limit removes the slowdown introduced by
>>>>>>> the patch while using av_always_inline does not.
>>>>>> So it's not doing the same thing. What is it doing differently?
>>>>>> Where did you get the limit number from?
>>>>>>
>>>>> All function calls within alsdec.s when using -finline-limit=4096:
>>>>> 1 call L1102
>>>>> 1 call L138
>>>>> 1 call L456
>>>>> 2 call L___udivdi3$stub
>>>>> 10 call L_av_freep$stub
>>>>> 1 call L_av_get_bits_per_sample_format$stub
>>>>> 12 call L_av_log$stub
>>>>> 5 call L_av_log_missing_feature$stub
>>>>> 8 call L_av_malloc$stub
>>>>> 2 call L_av_mallocz$stub
>>>>> 1 call L_ff_mpeg4audio_get_config$stub
>>>>> 6 call L_memcpy$stub
>>>>> 2 call L_memmove$stub
>>>>> 1 call L_memset$stub
>>>>> 2 call _decode_blocks_ind
>>>>> 4 call _decode_end
>>>>> 36 call _decode_rice
>>>>> 10 call _get_bits_long
>>>>> 11 call _parse_bs_info
>>>>> 2 call _zero_remaining
>>>>>
>>>>> All function calls within alsdec.s when using many av_always_inline's.
>>>>> This is designed to inline the same functions from alsdec.c like the
>>>>> unpatched alsdec.c would yield without any extra build option:
>>>>> 1 call L1561
>>>>> 1 call L176
>>>>> 1 call L21
>>>>> 2 call L___udivdi3$stub
>>>>> 10 call L_av_freep$stub
>>>>> 1 call L_av_get_bits_per_sample_format$stub
>>>>> 13 call L_av_log$stub
>>>>> 5 call L_av_log_missing_feature$stub
>>>>> 8 call L_av_malloc$stub
>>>>> 2 call L_av_mallocz$stub
>>>>> 1 call L_ff_mpeg4audio_get_config$stub
>>>>> 1 call L_memcpy$stub
>>>>> 1 call L_memmove$stub
>>>>> 2 call L_memset$stub
>>>>> 8 call ___inline_memcpy_chk
>>>>> 2 call ___inline_memmove_chk
>>>>> 6 call _align_get_bits
>>>>> 5 call _av_ceil_log2
>>>>> 4 call _av_clip
>>>>> 4 call _decode_end
>>>>> 47 call _get_bits
>>>>> 90 call _get_bits1
>>>>> 3 call _get_bits_count
>>>>> 61 call _get_bits_left
>>>>> 39 call _get_bits_long
>>>>> 4 call _get_sbits_long
>>>>> 60 call _get_unary
>>>>> 2 call _init_get_bits
>>>>> 3 call _parse_bs_info
>>>>> 3 call _read_time
>>>>> 7 call _skip_bits
>>>>> 2 call _skip_bits1
>>>>> 5 call _skip_bits_long
>>>> Not inlining those get_bits etc will certainly slow things down,
>>>> that's for sure.
>>>>
>>>>> So -finline-limit can inline many functions in the object file which are
>>>>> not part of alsdec.c. Which might be the reason for the performance
>>>>> difference.
>>>>>
>>>>> But using -finline-limit does not yield a speed gain for the unpatched
>>>>> file! So there might be something else but I don't see.
>>>>>
>>>>> The value of 4096 has been choosen randomly. As long as I don't know
>>>>> exactly why -finline-limit removes the slowdown and that it cannot be
>>>>> replaced by another approach, there is no need to figure out a more
>>>>> optimal value...
>>>> We should do some benchmarks using that flag globally and see what
>>>> happens. Maybe we'd gain from using it everywhere.
>>> Like Michael said, this would be a big test for different platforms and
>>> compilers which I cannot offer alone so several people would have to do
>>> this - if a benchmark would indicate that it might be worth testing.
>>>
>>> Also, I'm lacking a good idea of how to test this efficiently without
>>> having other factors like harddrives playing a predominant role which
>>> means testing execution time of ffmpeg.
>> I played around a little with the regression tests and audio decoders.
>> For most of my tests -finline-limit=4096 makes it a little faster, e.g.
>>
>> g726: 47001535 dezicycles -> 41628457 dezicycles (12%)
>> alac: 12855244 dezicycles -> 12849127 dezicycles ( 0%)
>> flac: 842020 dezicycles -> 786226 dezicycles ( 7%)
>> wma: 3663166 dezicycles -> 3197273 dezicycles (14%)
>>
>> which is not surprising. Inlining comes for a price, ffmpeg executable
>> growed from 5,4 MB to 6.1 MB.
>> Value used fro -finline-limit is 4096, default is 600 for gcc-4.0.
>
> what about video codecs? h264, mpeg4, mpeg2 h263 ?
Can do tomorrow.
> and which cpu?
Intel Core 2 Duo 2.53 GHz.
-Thilo
More information about the ffmpeg-devel
mailing list