[FFmpeg-devel] [PATCH 1/2] libavutil/cpu: Adds av_cpu_has_fast_gather to detect cpus with avx fast gather instruction
James Almer
jamrial at gmail.com
Mon Jun 14 15:17:38 EEST 2021
On 6/14/2021 8:53 AM, Ronald S. Bultje wrote:
> Hi Alan,
>
> On Mon, Jun 14, 2021 at 7:20 AM Alan Kelly <
> alankelly-at-google.com at ffmpeg.org> wrote:
>
>> Broadwell and later have fast gather instructions.
>> ---
>> This is so that the avx2 version of ff_hscale8to15X which uses gather
>> instructions is only selected on machines where it will actually be
>> faster.
>>
>
> We've in the past typically done this with a bit in the cpuflags return
> value. Can this be added there instead of being its own function?
>
> Also, what is the cycle count of ssse3/avx2 implementation for this
> specific function on Haswell? It would be good to note that in the
> respective patch so that we understand why the check was added.
Between 9 and 12 on Haswell, 5 to 7 on Broadwell, and about 2 to 5 on
Skylake and newer, acording to Agner's pdf if i'm reading it right. It's
also slow on AMD before Zen 3.
And yes, this should if anything be a new cpu flag and not a new function.
More information about the ffmpeg-devel
mailing list