[FFmpeg-devel] [PATCH] Extra build options for ALS (and others)
Thilo Borgmann
thilo.borgmann
Fri Nov 27 17:09:35 CET 2009
M?ns Rullg?rd schrieb:
> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
>
>> M?ns Rullg?rd schrieb:
>>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
>>>
>>>> M?ns Rullg?rd schrieb:
>>>>> Thilo Borgmann <thilo.borgmann at googlemail.com> writes:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> recently the need for an extra build option for the ALS decoder arose.
>>>>> Is it impossible to achieve the desired outcome with some combination
>>>>> of always_inline, noinline, and flatten attributes?
>>>> No. See [PATCH] Split reading and decoding of blocks in ALS.
>>>>
>>>> Although I've managed to have the functions from the alsdec.c inlined
>>>> manually according to the grep'ed output of the assembler code, it seems
>>>> like it is not enough to manually inline functions from within that .c
>>>> file only using these technique.
>>> I'm confused. Can it be done in the C code only or not? This kind of
>>> issue should really not be solved in the makefile.
>> The issue is the big slowdown. The patch that causes this splits a big
>> function into two, which are then called successively.
>>
>> To overcome the slowdown issue, I inspected the functions being inlined
>> with and without the -finline-limit option. I can use av_always_inline
>> for many functions within alsdec.c to have the same functions inlined
>> like -finline-limit does.
>>
>> Unfortunately, using -finline-limit removes the slowdown introduced by
>> the patch while using av_always_inline does not.
>
> So it's not doing the same thing. What is it doing differently?
> Where did you get the limit number from?
>
All function calls within alsdec.s when using -finline-limit=4096:
1 call L1102
1 call L138
1 call L456
2 call L___udivdi3$stub
10 call L_av_freep$stub
1 call L_av_get_bits_per_sample_format$stub
12 call L_av_log$stub
5 call L_av_log_missing_feature$stub
8 call L_av_malloc$stub
2 call L_av_mallocz$stub
1 call L_ff_mpeg4audio_get_config$stub
6 call L_memcpy$stub
2 call L_memmove$stub
1 call L_memset$stub
2 call _decode_blocks_ind
4 call _decode_end
36 call _decode_rice
10 call _get_bits_long
11 call _parse_bs_info
2 call _zero_remaining
All function calls within alsdec.s when using many av_always_inline's.
This is designed to inline the same functions from alsdec.c like the
unpatched alsdec.c would yield without any extra build option:
1 call L1561
1 call L176
1 call L21
2 call L___udivdi3$stub
10 call L_av_freep$stub
1 call L_av_get_bits_per_sample_format$stub
13 call L_av_log$stub
5 call L_av_log_missing_feature$stub
8 call L_av_malloc$stub
2 call L_av_mallocz$stub
1 call L_ff_mpeg4audio_get_config$stub
1 call L_memcpy$stub
1 call L_memmove$stub
2 call L_memset$stub
8 call ___inline_memcpy_chk
2 call ___inline_memmove_chk
6 call _align_get_bits
5 call _av_ceil_log2
4 call _av_clip
4 call _decode_end
47 call _get_bits
90 call _get_bits1
3 call _get_bits_count
61 call _get_bits_left
39 call _get_bits_long
4 call _get_sbits_long
60 call _get_unary
2 call _init_get_bits
3 call _parse_bs_info
3 call _read_time
7 call _skip_bits
2 call _skip_bits1
5 call _skip_bits_long
So -finline-limit can inline many functions in the object file which are
not part of alsdec.c. Which might be the reason for the performance
difference.
But using -finline-limit does not yield a speed gain for the unpatched
file! So there might be something else but I don't see.
The value of 4096 has been choosen randomly. As long as I don't know
exactly why -finline-limit removes the slowdown and that it cannot be
replaced by another approach, there is no need to figure out a more
optimal value...
-Thilo
More information about the ffmpeg-devel
mailing list