[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds fast gather detection.

James Almer jamrial at gmail.com
Fri Jul 16 18:08:12 EEST 2021


On 7/16/2021 11:46 AM, Alan Kelly wrote:
> On Fri, Jul 16, 2021 at 4:02 PM James Almer <jamrial at gmail.com> wrote:
> 
>> On 7/16/2021 10:44 AM, Alan Kelly wrote:
>>> Broadwell and later and Zen3 and later have fast gather instructions.
>>> ---
>>>    Haswell is now excluded from EXTERNAL_AVX2_FAST as discussed in the
>>>    email thread.
>>
>> I was very explicit about this not being ok. We're not disabling all ymm
>> usage for Haswell just for one or two swscale functions using gathers.
>>
>> Lets go with Lynne's latest suggestion and not change the flags at all
>> and use gathers on Haswell, same as other arches, by looking at the
>> AVX2_FAST flag.
>>
>>>    libavutil/cpu.h     |  1 +
>>>    libavutil/x86/cpu.c | 11 ++++++++++-
>>>    2 files changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/libavutil/cpu.h b/libavutil/cpu.h
>>> index c069076439..ec3073d021 100644
>>> --- a/libavutil/cpu.h
>>> +++ b/libavutil/cpu.h
>>> @@ -113,6 +113,7 @@ void av_force_cpu_count(int count);
>>>     *  av_set_cpu_flags_mask(), then this function will behave as if AVX
>> is not
>>>     *  present.
>>>     */
>>> +
>>>    size_t av_cpu_max_align(void);
>>>
>>>    #endif /* AVUTIL_CPU_H */
>>> diff --git a/libavutil/x86/cpu.c b/libavutil/x86/cpu.c
>>> index bcd41a50a2..158e2170c4 100644
>>> --- a/libavutil/x86/cpu.c
>>> +++ b/libavutil/x86/cpu.c
>>> @@ -146,8 +146,17 @@ int ff_get_cpu_flags_x86(void)
>>>        if (max_std_level >= 7) {
>>>            cpuid(7, eax, ebx, ecx, edx);
>>>    #if HAVE_AVX2
>>> -        if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020))
>>> +        if ((rval & AV_CPU_FLAG_AVX) && (ebx & 0x00000020)){
>>>                rval |= AV_CPU_FLAG_AVX2;
>>> +
>>> +            cpuid(1, eax, ebx, ecx, std_caps);
>>> +            family = ((eax >> 8) & 0xf) + ((eax >> 20) & 0xff);
>>> +            model  = ((eax >> 4) & 0xf) + ((eax >> 12) & 0xf0);
>>> +            // Haswell and earlier has slow gather
>>> +            if(family == 6 && model < 70)
>>> +                rval |= AV_CPU_FLAG_AVXSLOW;
>>> +        }
>>> +
>>>    #if HAVE_AVX512 /* F, CD, BW, DQ, VL */
>>>            if ((xcr0_lo & 0xe0) == 0xe0) { /* OPMASK/ZMM state */
>>>                if ((rval & AV_CPU_FLAG_AVX2) && (ebx & 0xd0030000) ==
>> 0xd0030000)
>>>
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>>
> 
> OK, apologies for the misunderstanding. In that case part 1 of this patch
> is not required. Part two remains valid with the function protected by
> EXTERNAL_AVX2_FAST. Should part 2 be re-submitted as a standalone patch or
> is it OK as is?

It's ok as is. Thanks.


More information about the ffmpeg-devel mailing list