[FFmpeg-devel] [FFmpeg-cvslog] x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128

Martin Storsjö martin at martin.st
Thu Jul 11 16:42:42 EEST 2024


On Thu, 11 Jul 2024, James Almer wrote:

> On 7/11/2024 10:08 AM, Martin Storsjö wrote:
>> On Wed, 10 Jul 2024, James Almer wrote:
>> 
>>> ffmpeg | branch: master | James Almer <jamrial at gmail.com> | Wed Jul 10 
>>> 13:00:20 2024 -0300| [bd1bcb07e0f29c135103a402d71b343a09ad1690] | 
>>> committer: James Almer
>>>
>>> x86/intreadwrite: use intrinsics instead of inline asm for AV_COPY128
>>>
>>> This has the benefit of removing any SSE -> AVX penalty that may 
>>> happen when
>>> the compiler emits VEX encoded instructions.
>>>
>>> Signed-off-by: James Almer <jamrial at gmail.com>
>>>
>>>> 
> http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=bd1bcb07e0f29c135103a402d71b343a09ad1690
>>> ---
>>>
>>> configure                    |  5 ++++-
>>> libavutil/x86/intreadwrite.h | 20 +++++++-------------
>>> 2 files changed, 11 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/configure b/configure
>>> index f84fefeaab..7151ed1de3 100755
>>> --- a/configure
>>> +++ b/configure
>>> @@ -2314,6 +2314,7 @@ HEADERS_LIST="
>>>
>>> INTRINSICS_LIST="
>>>     intrinsics_neon
>>> +    intrinsics_sse
>>>     intrinsics_sse2
>>> "
>>>
>>> @@ -2744,7 +2745,8 @@ armv6t2_deps="arm"
>>> armv8_deps="aarch64"
>>> neon_deps_any="aarch64 arm"
>>> intrinsics_neon_deps="neon"
>>> -intrinsics_sse2_deps="sse2"
>>> +intrinsics_sse_deps="sse"
>>> +intrinsics_sse2_deps="sse2 intrinsics_sse"
>>> vfp_deps="arm"
>>> vfpv3_deps="vfp"
>>> setend_deps="arm"
>>> @@ -6446,6 +6448,7 @@ elif enabled loongarch; then
>>> fi
>>>
>>> check_cc intrinsics_neon arm_neon.h "int16x8_t test = vdupq_n_s16(0)"
>>> +check_cc intrinsics_sse immintrin.h "__m128 test = _mm_setzero_ps()"
>>> check_cc intrinsics_sse2 emmintrin.h "__m128i test = 
> _mm_setzero_si128()"
>>>
>>> check_ldflags -Wl,--as-needed
>>> diff --git a/libavutil/x86/intreadwrite.h 
> b/libavutil/x86/intreadwrite.h
>>> index 9bbef00dba..6546eb016c 100644
>>> --- a/libavutil/x86/intreadwrite.h
>>> +++ b/libavutil/x86/intreadwrite.h
>>> @@ -22,29 +22,25 @@
>>> #define AVUTIL_X86_INTREADWRITE_H
>>>
>>> #include <stdint.h>
>>> +#if HAVE_INTRINSICS_SSE
>>> +#include <immintrin.h>
>>> +#endif
>> 
>> This change seems to have broken builds for x86 with Clang 16 or newer. 
>> (Clang 15 and lower seems to be fine.)
>> 
>> See e.g. 
>> 
> http://fate.ffmpeg.org/log.cgi?slot=i686-mingw32-clang-trunk&time=20240711035948&log=compile 
> for an example of the error. The issue is that a clang internal 
> intrinsics header contains "_mm_comige_sh(__m128h A,", i.e. a parameter 
> with the name "A" (which toolchain provided headers shouldn't use). This 
> clashes with libavcodec/huffuyv.h, which has a "#define A 3".
>> 
>> This is obviously a Clang intrinsics header bug, but we can't fix the 
>> existing Clang 16-18 releases that are out there, so I guess what we 
> can 
>> do is change our "define A" to something more elaborate. (IIRC there 
> are 
>> some similar issues with names with ncurses and/or android headers 
> too.)
>> 
>> // Martin
>
> We also do "#define A AV_OPT_FLAG_AUDIO_PARAM" in options_table.h and 
> probably other places, so changing huffyuv.h may not be enough.

That's quite possible, but those cases may be including intreadwrite.h 
before that, do it's possible it might not trigger there.

I'll see how many places need to be changed here.

I sent a fix to Clang in https://github.com/llvm/llvm-project/pull/98478.

// Martin


More information about the ffmpeg-devel mailing list