[FFmpeg-devel] [PATCH v2 1/2] avfilter/vf_blackdetect: add AVX2 SIMD version

Fri Jul 18 16:33:04 EEST 2025

On Fri, Jul 18, 2025 at 2:22 PM Kacper Michajlow <kasper93 at gmail.com> wrote:
>
> On Fri, 18 Jul 2025 at 14:46, Kieran Kunhya via ffmpeg-devel
> <ffmpeg-devel at ffmpeg.org> wrote:
> >
> > On Fri, Jul 18, 2025 at 1:41 PM Kacper Michajlow <kasper93 at gmail.com> wrote:
> > >
> > > On Fri, 18 Jul 2025 at 14:14, Kieran Kunhya via ffmpeg-devel
> > > <ffmpeg-devel at ffmpeg.org> wrote:
> > > >
> > > > > blackdetect8_c:                                        820.8 ( 1.00x)
> > > > > blackdetect8_avx2:                                     219.2 ( 3.74x)
> > > > > blackdetect16_c:                                       372.8 ( 1.00x)
> > > > > blackdetect16_avx2:                                    201.4 ( 1.85x)
> > > > >
> > > > > Again, sorry for being pedantic here, but it gives the wrong
> > > > > impression especially if you look at this from outside.
> > > >
> > > > Also misleading as far as I understand because GCC doesn't have
> > > > runtime detection like FFmpeg.
> > >
> > > Speak of... actually GCC does have runtime detection. All you have to
> > > do is mark the function with `target_clones` with requested
> > > architectures and it will dispatch automatically during runtime the
> > > best function to use.
> > >
> > > See for more information:
> > > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-target_005fclones-function-attribute
> >
> > It's not as sophisticated as our runtime detection (e.g avx512 vs
> > avx512icl which we support).
> > Comparing C vs autovectorised code that works only on some platforms
> > with forced compilation settings is also unfair.
>
> In my original message clang build was completely default, no forced options.
>
> Handwritten avx512 also works on this specific platform. So comparing
> this to autovectorized code (that works on exactly the same platform)
> as a baseline makes sense. Furthermore autovectorized code can scale
> onto more platforms than handwritten avx512. IMHO comparing things in
> the same domain makes more sense.
>
> The point of my message was that we should have defined a baseline
> target, if it is GCC without autovectorization, so be it. But it
> should be specified and not implied in the commit description that the
> compared result is autovectorized.
>
> To be honest, I agree with you. It's misleading and unfair, so we
> shouldn't make any comparisons. This is not only limited to
> autovectorization, scalar code generation also differs. It just
> happens to give the biggest difference.
>
> Context matters, saying "C code performance " is vague. I'm not saying
> one way is better than the other, but it doesn't cost anything to
> specify it better to avoid miscommunication.

It's not fair to compare autovectorised output that's AVX512 that will
be called *on any system with AVX512 support including ones that
downclock heavily* with AVX512(ICL) checked properly in FFmpeg to run
on only non-downlocking systems.
Outside the land of the theoretical compiler world, this is a
practical problem. If FFmpeg used compiler runtime detection I
personally would have a significant number of systems downclock
drastically.
I don't believe compilers are smart enough to generate AVX512 with YMM
for that use-case.

It's substantially uglier to use compiler-specific runtime detection.
Compiler autovectorisation is inconsistent across compiler versions.
It's nothing that can be relied upon.

Kieran