[FFmpeg-devel] [PATCH v2 1/2] avfilter/vf_blackdetect: add AVX2 SIMD version
Kieran Kunhya
kieran618 at googlemail.com
Fri Jul 18 17:36:40 EEST 2025
On Fri, Jul 18, 2025 at 3:17 PM Kacper Michajlow <kasper93 at gmail.com> wrote:
>
> On Fri, 18 Jul 2025 at 15:33, Kieran Kunhya via ffmpeg-devel
> <ffmpeg-devel at ffmpeg.org> wrote:
> >
> > On Fri, Jul 18, 2025 at 2:22 PM Kacper Michajlow <kasper93 at gmail.com> wrote:
> > >
> > > On Fri, 18 Jul 2025 at 14:46, Kieran Kunhya via ffmpeg-devel
> > > <ffmpeg-devel at ffmpeg.org> wrote:
> > > >
> > > > On Fri, Jul 18, 2025 at 1:41 PM Kacper Michajlow <kasper93 at gmail.com> wrote:
> > > > >
> > > > > On Fri, 18 Jul 2025 at 14:14, Kieran Kunhya via ffmpeg-devel
> > > > > <ffmpeg-devel at ffmpeg.org> wrote:
> > > > > >
> > > > > > > blackdetect8_c: 820.8 ( 1.00x)
> > > > > > > blackdetect8_avx2: 219.2 ( 3.74x)
> > > > > > > blackdetect16_c: 372.8 ( 1.00x)
> > > > > > > blackdetect16_avx2: 201.4 ( 1.85x)
> > > > > > >
> > > > > > > Again, sorry for being pedantic here, but it gives the wrong
> > > > > > > impression especially if you look at this from outside.
> > > > > >
> > > > > > Also misleading as far as I understand because GCC doesn't have
> > > > > > runtime detection like FFmpeg.
> > > > >
> > > > > Speak of... actually GCC does have runtime detection. All you have to
> > > > > do is mark the function with `target_clones` with requested
> > > > > architectures and it will dispatch automatically during runtime the
> > > > > best function to use.
> > > > >
> > > > > See for more information:
> > > > > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-target_005fclones-function-attribute
> > > >
> > > > It's not as sophisticated as our runtime detection (e.g avx512 vs
> > > > avx512icl which we support).
> > > > Comparing C vs autovectorised code that works only on some platforms
> > > > with forced compilation settings is also unfair.
> > >
> > > In my original message clang build was completely default, no forced options.
> > >
> > > Handwritten avx512 also works on this specific platform. So comparing
> > > this to autovectorized code (that works on exactly the same platform)
> > > as a baseline makes sense. Furthermore autovectorized code can scale
> > > onto more platforms than handwritten avx512. IMHO comparing things in
> > > the same domain makes more sense.
> > >
> > > The point of my message was that we should have defined a baseline
> > > target, if it is GCC without autovectorization, so be it. But it
> > > should be specified and not implied in the commit description that the
> > > compared result is autovectorized.
> > >
> > > To be honest, I agree with you. It's misleading and unfair, so we
> > > shouldn't make any comparisons. This is not only limited to
> > > autovectorization, scalar code generation also differs. It just
> > > happens to give the biggest difference.
> > >
> > > Context matters, saying "C code performance " is vague. I'm not saying
> > > one way is better than the other, but it doesn't cost anything to
> > > specify it better to avoid miscommunication.
> >
> > It's not fair to compare autovectorised output that's AVX512 that will
> > be called *on any system with AVX512 support including ones that
> > downclock heavily* with AVX512(ICL) checked properly in FFmpeg to run
> > on only non-downlocking systems.
>
> That's the customer/user decision how to compile FFmpeg for best
> performance on their target platform. Also note, you brought up
> avx512, while I agree on the issues with it. I'm commenting on the
> AVX2 patch. I wanted to make general comment about the performance
> metric we share, diving into avx512 issues is kinda a separate topic.
Huh, we should have the best performance for *all* users (all
compilers, all platforms) by default.
We have this now for SIMD functions, it's an open question about
autovec for the rest.
Kieran
More information about the ffmpeg-devel
mailing list