[FFmpeg-devel] [PATCH] swscale/x86/output.asm: add x86-optimized planer gbr yuv2anyX functions
Michael Niedermayer
michael at niedermayer.cc
Mon Oct 25 21:19:06 EEST 2021
On Sun, Oct 24, 2021 at 09:09:52PM -0700, mindmark at gmail.com wrote:
> From: Mark Reid <mindmark at gmail.com>
>
> yuv2gbrp_full_X_4_512_c: 12096.6
> yuv2gbrp_full_X_4_512_sse2: 10782.6
> yuv2gbrp_full_X_4_512_sse4: 5143.6
> yuv2gbrp_full_X_4_512_avx2: 3000.1
> yuv2gbrap_full_X_4_512_c: 15463.1
> yuv2gbrap_full_X_4_512_sse2: 14296.6
> yuv2gbrap_full_X_4_512_sse4: 6319.1
> yuv2gbrap_full_X_4_512_avx2: 3554.1
> yuv2gbrp9be_full_X_4_512_c: 14281.6
> yuv2gbrp9be_full_X_4_512_sse2: 11206.1
> yuv2gbrp9be_full_X_4_512_sse4: 5033.6
> yuv2gbrp9be_full_X_4_512_avx2: 3012.6
> yuv2gbrp9le_full_X_4_512_c: 12688.6
> yuv2gbrp9le_full_X_4_512_sse2: 10914.1
> yuv2gbrp9le_full_X_4_512_sse4: 5144.6
> yuv2gbrp9le_full_X_4_512_avx2: 3014.6
> yuv2gbrp10be_full_X_4_512_c: 14257.6
> yuv2gbrp10be_full_X_4_512_sse2: 11089.6
> yuv2gbrp10be_full_X_4_512_sse4: 5039.1
> yuv2gbrp10be_full_X_4_512_avx2: 3001.1
> yuv2gbrp10le_full_X_4_512_c: 12098.6
> yuv2gbrp10le_full_X_4_512_sse2: 10884.1
> yuv2gbrp10le_full_X_4_512_sse4: 5138.1
> yuv2gbrp10le_full_X_4_512_avx2: 2999.6
> yuv2gbrap10be_full_X_4_512_c: 18549.6
> yuv2gbrap10be_full_X_4_512_sse2: 14538.6
> yuv2gbrap10be_full_X_4_512_sse4: 6292.6
> yuv2gbrap10be_full_X_4_512_avx2: 3583.6
> yuv2gbrap10le_full_X_4_512_c: 16631.1
> yuv2gbrap10le_full_X_4_512_sse2: 14190.6
> yuv2gbrap10le_full_X_4_512_sse4: 6348.1
> yuv2gbrap10le_full_X_4_512_avx2: 3554.6
> yuv2gbrp12be_full_X_4_512_c: 13555.1
> yuv2gbrp12be_full_X_4_512_sse2: 10952.1
> yuv2gbrp12be_full_X_4_512_sse4: 5137.6
> yuv2gbrp12be_full_X_4_512_avx2: 3009.6
> yuv2gbrp12le_full_X_4_512_c: 12082.6
> yuv2gbrp12le_full_X_4_512_sse2: 10891.1
> yuv2gbrp12le_full_X_4_512_sse4: 5184.1
> yuv2gbrp12le_full_X_4_512_avx2: 3011.1
> yuv2gbrap12be_full_X_4_512_c: 18689.6
> yuv2gbrap12be_full_X_4_512_sse2: 14522.6
> yuv2gbrap12be_full_X_4_512_sse4: 6237.6
> yuv2gbrap12be_full_X_4_512_avx2: 3585.6
> yuv2gbrap12le_full_X_4_512_c: 16760.6
> yuv2gbrap12le_full_X_4_512_sse2: 14202.1
> yuv2gbrap12le_full_X_4_512_sse4: 6252.1
> yuv2gbrap12le_full_X_4_512_avx2: 3591.1
> yuv2gbrp14be_full_X_4_512_c: 13555.6
> yuv2gbrp14be_full_X_4_512_sse2: 10949.1
> yuv2gbrp14be_full_X_4_512_sse4: 5185.1
> yuv2gbrp14be_full_X_4_512_avx2: 3012.1
> yuv2gbrp14le_full_X_4_512_c: 12068.1
> yuv2gbrp14le_full_X_4_512_sse2: 10883.6
> yuv2gbrp14le_full_X_4_512_sse4: 5145.1
> yuv2gbrp14le_full_X_4_512_avx2: 3007.1
> yuv2gbrp16be_full_X_4_512_c: 12383.6
> yuv2gbrp16be_full_X_4_512_sse2: 8230.6
> yuv2gbrp16be_full_X_4_512_sse4: 4765.6
> yuv2gbrp16be_full_X_4_512_avx2: 2742.6
> yuv2gbrp16le_full_X_4_512_c: 10906.1
> yuv2gbrp16le_full_X_4_512_sse2: 28732.1
> yuv2gbrp16le_full_X_4_512_sse4: 4709.6
> yuv2gbrp16le_full_X_4_512_avx2: 2753.1
> yuv2gbrap16be_full_X_4_512_c: 15472.6
> yuv2gbrap16be_full_X_4_512_sse2: 11021.6
> yuv2gbrap16be_full_X_4_512_sse4: 5487.6
> yuv2gbrap16be_full_X_4_512_avx2: 3143.6
> yuv2gbrap16le_full_X_4_512_c: 13668.6
> yuv2gbrap16le_full_X_4_512_sse2: 10562.1
> yuv2gbrap16le_full_X_4_512_sse4: 5506.6
> yuv2gbrap16le_full_X_4_512_avx2: 3149.6
> yuv2gbrpf32be_full_X_4_512_c: 15471.1
> yuv2gbrpf32be_full_X_4_512_sse2: 8524.6
> yuv2gbrpf32be_full_X_4_512_sse4: 4559.1
> yuv2gbrpf32be_full_X_4_512_avx2: 2388.1
> yuv2gbrpf32le_full_X_4_512_c: 14247.6
> yuv2gbrpf32le_full_X_4_512_sse2: 7600.6
> yuv2gbrpf32le_full_X_4_512_sse4: 4385.6
> yuv2gbrpf32le_full_X_4_512_avx2: 2258.6
> yuv2gbrapf32be_full_X_4_512_c: 18412.1
> yuv2gbrapf32be_full_X_4_512_sse2: 11353.6
> yuv2gbrapf32be_full_X_4_512_sse4: 5807.1
> yuv2gbrapf32be_full_X_4_512_avx2: 2928.1
> yuv2gbrapf32le_full_X_4_512_c: 16485.1
> yuv2gbrapf32le_full_X_4_512_sse2: 10202.1
> yuv2gbrapf32le_full_X_4_512_sse4: 5571.6
> yuv2gbrapf32le_full_X_4_512_avx2: 2847.6
>
>
> ---
> libswscale/x86/output.asm | 440 +++++++++++++++++++++++++++++++++++++-
> libswscale/x86/swscale.c | 99 +++++++++
> tests/checkasm/Makefile | 2 +-
> tests/checkasm/checkasm.c | 1 +
> tests/checkasm/checkasm.h | 1 +
> tests/checkasm/sw_gbrp.c | 198 +++++++++++++++++
> tests/fate/checkasm.mak | 1 +
> 7 files changed, 740 insertions(+), 2 deletions(-)
> create mode 100644 tests/checkasm/sw_gbrp.c
seems to work
asm review left to people who worked with asm more recently than me
also if you or anyone wants a random idea for swscale improvments
we are missing a direct yuv->yuv converter converting between different
yuv colorspaces, atm these are handled with rgb intermediate
thx
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
If you fake or manipulate statistics in a paper in physics you will never
get a job again.
If you fake or manipulate statistics in a paper in medicin you will get
a job for life at the pharma industry.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20211025/d98ae8db/attachment.sig>
More information about the ffmpeg-devel
mailing list