[FFmpeg-devel] [PATCH] swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions

Mark Reid mindmark at gmail.com
Wed Oct 27 10:51:07 EEST 2021


On Monday, October 25, 2021, Michael Niedermayer <michael at niedermayer.cc>
wrote:

> On Sun, Oct 24, 2021 at 09:09:52PM -0700, mindmark at gmail.com wrote:
> > From: Mark Reid <mindmark at gmail.com>
> >
> > yuv2gbrp_full_X_4_512_c: 12096.6
> > yuv2gbrp_full_X_4_512_sse2: 10782.6
> > yuv2gbrp_full_X_4_512_sse4: 5143.6
> > yuv2gbrp_full_X_4_512_avx2: 3000.1
> > yuv2gbrap_full_X_4_512_c: 15463.1
> > yuv2gbrap_full_X_4_512_sse2: 14296.6
> > yuv2gbrap_full_X_4_512_sse4: 6319.1
> > yuv2gbrap_full_X_4_512_avx2: 3554.1
> > yuv2gbrp9be_full_X_4_512_c: 14281.6
> > yuv2gbrp9be_full_X_4_512_sse2: 11206.1
> > yuv2gbrp9be_full_X_4_512_sse4: 5033.6
> > yuv2gbrp9be_full_X_4_512_avx2: 3012.6
> > yuv2gbrp9le_full_X_4_512_c: 12688.6
> > yuv2gbrp9le_full_X_4_512_sse2: 10914.1
> > yuv2gbrp9le_full_X_4_512_sse4: 5144.6
> > yuv2gbrp9le_full_X_4_512_avx2: 3014.6
> > yuv2gbrp10be_full_X_4_512_c: 14257.6
> > yuv2gbrp10be_full_X_4_512_sse2: 11089.6
> > yuv2gbrp10be_full_X_4_512_sse4: 5039.1
> > yuv2gbrp10be_full_X_4_512_avx2: 3001.1
> > yuv2gbrp10le_full_X_4_512_c: 12098.6
> > yuv2gbrp10le_full_X_4_512_sse2: 10884.1
> > yuv2gbrp10le_full_X_4_512_sse4: 5138.1
> > yuv2gbrp10le_full_X_4_512_avx2: 2999.6
> > yuv2gbrap10be_full_X_4_512_c: 18549.6
> > yuv2gbrap10be_full_X_4_512_sse2: 14538.6
> > yuv2gbrap10be_full_X_4_512_sse4: 6292.6
> > yuv2gbrap10be_full_X_4_512_avx2: 3583.6
> > yuv2gbrap10le_full_X_4_512_c: 16631.1
> > yuv2gbrap10le_full_X_4_512_sse2: 14190.6
> > yuv2gbrap10le_full_X_4_512_sse4: 6348.1
> > yuv2gbrap10le_full_X_4_512_avx2: 3554.6
> > yuv2gbrp12be_full_X_4_512_c: 13555.1
> > yuv2gbrp12be_full_X_4_512_sse2: 10952.1
> > yuv2gbrp12be_full_X_4_512_sse4: 5137.6
> > yuv2gbrp12be_full_X_4_512_avx2: 3009.6
> > yuv2gbrp12le_full_X_4_512_c: 12082.6
> > yuv2gbrp12le_full_X_4_512_sse2: 10891.1
> > yuv2gbrp12le_full_X_4_512_sse4: 5184.1
> > yuv2gbrp12le_full_X_4_512_avx2: 3011.1
> > yuv2gbrap12be_full_X_4_512_c: 18689.6
> > yuv2gbrap12be_full_X_4_512_sse2: 14522.6
> > yuv2gbrap12be_full_X_4_512_sse4: 6237.6
> > yuv2gbrap12be_full_X_4_512_avx2: 3585.6
> > yuv2gbrap12le_full_X_4_512_c: 16760.6
> > yuv2gbrap12le_full_X_4_512_sse2: 14202.1
> > yuv2gbrap12le_full_X_4_512_sse4: 6252.1
> > yuv2gbrap12le_full_X_4_512_avx2: 3591.1
> > yuv2gbrp14be_full_X_4_512_c: 13555.6
> > yuv2gbrp14be_full_X_4_512_sse2: 10949.1
> > yuv2gbrp14be_full_X_4_512_sse4: 5185.1
> > yuv2gbrp14be_full_X_4_512_avx2: 3012.1
> > yuv2gbrp14le_full_X_4_512_c: 12068.1
> > yuv2gbrp14le_full_X_4_512_sse2: 10883.6
> > yuv2gbrp14le_full_X_4_512_sse4: 5145.1
> > yuv2gbrp14le_full_X_4_512_avx2: 3007.1
> > yuv2gbrp16be_full_X_4_512_c: 12383.6
> > yuv2gbrp16be_full_X_4_512_sse2: 8230.6
> > yuv2gbrp16be_full_X_4_512_sse4: 4765.6
> > yuv2gbrp16be_full_X_4_512_avx2: 2742.6
> > yuv2gbrp16le_full_X_4_512_c: 10906.1
> > yuv2gbrp16le_full_X_4_512_sse2: 28732.1
> > yuv2gbrp16le_full_X_4_512_sse4: 4709.6
> > yuv2gbrp16le_full_X_4_512_avx2: 2753.1
> > yuv2gbrap16be_full_X_4_512_c: 15472.6
> > yuv2gbrap16be_full_X_4_512_sse2: 11021.6
> > yuv2gbrap16be_full_X_4_512_sse4: 5487.6
> > yuv2gbrap16be_full_X_4_512_avx2: 3143.6
> > yuv2gbrap16le_full_X_4_512_c: 13668.6
> > yuv2gbrap16le_full_X_4_512_sse2: 10562.1
> > yuv2gbrap16le_full_X_4_512_sse4: 5506.6
> > yuv2gbrap16le_full_X_4_512_avx2: 3149.6
> > yuv2gbrpf32be_full_X_4_512_c: 15471.1
> > yuv2gbrpf32be_full_X_4_512_sse2: 8524.6
> > yuv2gbrpf32be_full_X_4_512_sse4: 4559.1
> > yuv2gbrpf32be_full_X_4_512_avx2: 2388.1
> > yuv2gbrpf32le_full_X_4_512_c: 14247.6
> > yuv2gbrpf32le_full_X_4_512_sse2: 7600.6
> > yuv2gbrpf32le_full_X_4_512_sse4: 4385.6
> > yuv2gbrpf32le_full_X_4_512_avx2: 2258.6
> > yuv2gbrapf32be_full_X_4_512_c: 18412.1
> > yuv2gbrapf32be_full_X_4_512_sse2: 11353.6
> > yuv2gbrapf32be_full_X_4_512_sse4: 5807.1
> > yuv2gbrapf32be_full_X_4_512_avx2: 2928.1
> > yuv2gbrapf32le_full_X_4_512_c: 16485.1
> > yuv2gbrapf32le_full_X_4_512_sse2: 10202.1
> > yuv2gbrapf32le_full_X_4_512_sse4: 5571.6
> > yuv2gbrapf32le_full_X_4_512_avx2: 2847.6
> >
> >
> > ---
> >  libswscale/x86/output.asm | 440 +++++++++++++++++++++++++++++++++++++-
> >  libswscale/x86/swscale.c  |  99 +++++++++
> >  tests/checkasm/Makefile   |   2 +-
> >  tests/checkasm/checkasm.c |   1 +
> >  tests/checkasm/checkasm.h |   1 +
> >  tests/checkasm/sw_gbrp.c  | 198 +++++++++++++++++
> >  tests/fate/checkasm.mak   |   1 +
> >  7 files changed, 740 insertions(+), 2 deletions(-)
> >  create mode 100644 tests/checkasm/sw_gbrp.c
>
> seems to work
> asm review left to people who worked with asm more recently than me
>
>
Thanks for taking the time to test, I was planning on doing the planer
input ones next and add the missing unscaled floating point rgb2rgb
functions


> also if you or anyone wants a random idea for swscale improvments
> we are missing a direct yuv->yuv converter converting between different
> yuv colorspaces, atm these are handled with rgb intermediate
>
>
Like what the vf_colormatrix filter does?



> thx
>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> If you fake or manipulate statistics in a paper in physics you will never
> get a job again.
> If you fake or manipulate statistics in a paper in medicin you will get
> a job for life at the pharma industry.
>


More information about the ffmpeg-devel mailing list