[FFmpeg-devel] [PATCH] get_cabac_inline_x86: Don't inline if 32-bit clang on windows
Martin Storsjö
martin at martin.st
Wed Aug 18 13:01:21 EEST 2021
On Tue, 17 Aug 2021, James Almer wrote:
> On 8/17/2021 12:35 PM, Christopher Degawa wrote:
>> Fixes https://trac.ffmpeg.org/ticket/8903
>>
>> relevant https://github.com/msys2/MINGW-packages/discussions/9258
>>
>> Signed-off-by: Christopher Degawa <ccom at randomderp.com>
>> ---
>> libavcodec/x86/cabac.h | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/libavcodec/x86/cabac.h b/libavcodec/x86/cabac.h
>> index 53d74c541e..b046a56a6b 100644
>> --- a/libavcodec/x86/cabac.h
>> +++ b/libavcodec/x86/cabac.h
>> @@ -177,8 +177,13 @@
>> #if HAVE_7REGS && !BROKEN_COMPILER
>> #define get_cabac_inline get_cabac_inline_x86
>> -static av_always_inline int get_cabac_inline_x86(CABACContext *c,
>> - uint8_t *const state)
>> +static
>> +#if defined(_WIN32) && !defined(_WIN64) && defined(__clang__)
>
> Can you do some benchmarks to see how not inlining this compares to simply
> disabling this code for this target? Because it sounds like you may want to
> add this case to the BROKEN_COMPILER macro, and not use this code at all.
I tried benchmarking it, and in short, this patch seems to be the best
solution.
I tested 3 configurations; with this patch (changing av_always_inline into
av_noinline), setting BROKEN_COMPILER (skipping these inline asm
functions) and configuring with --cpu=i686 (which means it passes
-march=i686 to the compiler, which disallows the use of inline MMX/SSE). I
benchmarked singlethreaded decoding of a high bitrate H264 clip (listing
the lowest measured time out of 3 runs):
av_noinline: 90.94 seconds
BROKEN_COMPILER: 98.92 seconds
-march=i686: 94.63 seconds
(The fact that building with -march=i686 is faster than using some but not
all inline MMX/SSE is a bit surprising.)
I also tested the same setup on x86_64 (on a different machine, with Apple
Clang), where I tested the above and compare it with the default
configuration using av_always_inline):
av_always_inline: 74.65 seconds
av_noinline: 73.74 seconds
BROKEN_COMPILER: 78.10 seconds
So av_noinline actually seems to be generally favourable here (and for
some reason, actually a bit faster than the always_inline case, although
I'm not sure if that bit is deterministic in general or not).
// Martin
More information about the ffmpeg-devel
mailing list