[FFmpeg-devel] [PATCH v2 1/2] avfilter/vf_blackdetect: add AVX2 SIMD version

Fri Jul 18 16:21:19 EEST 2025

On Fri, 18 Jul 2025 at 14:46, Kieran Kunhya via ffmpeg-devel
<ffmpeg-devel at ffmpeg.org> wrote:
>
> On Fri, Jul 18, 2025 at 1:41 PM Kacper Michajlow <kasper93 at gmail.com> wrote:
> >
> > On Fri, 18 Jul 2025 at 14:14, Kieran Kunhya via ffmpeg-devel
> > <ffmpeg-devel at ffmpeg.org> wrote:
> > >
> > > > blackdetect8_c:                                        820.8 ( 1.00x)
> > > > blackdetect8_avx2:                                     219.2 ( 3.74x)
> > > > blackdetect16_c:                                       372.8 ( 1.00x)
> > > > blackdetect16_avx2:                                    201.4 ( 1.85x)
> > > >
> > > > Again, sorry for being pedantic here, but it gives the wrong
> > > > impression especially if you look at this from outside.
> > >
> > > Also misleading as far as I understand because GCC doesn't have
> > > runtime detection like FFmpeg.
> >
> > Speak of... actually GCC does have runtime detection. All you have to
> > do is mark the function with `target_clones` with requested
> > architectures and it will dispatch automatically during runtime the
> > best function to use.
> >
> > See for more information:
> > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-target_005fclones-function-attribute
>
> It's not as sophisticated as our runtime detection (e.g avx512 vs
> avx512icl which we support).
> Comparing C vs autovectorised code that works only on some platforms
> with forced compilation settings is also unfair.

In my original message clang build was completely default, no forced options.

Handwritten avx512 also works on this specific platform. So comparing
this to autovectorized code (that works on exactly the same platform)
as a baseline makes sense. Furthermore autovectorized code can scale
onto more platforms than handwritten avx512. IMHO comparing things in
the same domain makes more sense.

The point of my message was that we should have defined a baseline
target, if it is GCC without autovectorization, so be it. But it
should be specified and not implied in the commit description that the
compared result is autovectorized.

To be honest, I agree with you. It's misleading and unfair, so we
shouldn't make any comparisons. This is not only limited to
autovectorization, scalar code generation also differs. It just
happens to give the biggest difference.

Context matters, saying "C code performance " is vague. I'm not saying
one way is better than the other, but it doesn't cost anything to
specify it better to avoid miscommunication.

- Kacper