[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.
Anton Khirnov
anton at khirnov.net
Fri Dec 4 15:00:15 EET 2020
Quoting Alan Kelly (2020-11-19 09:41:56)
> ---
> All of Henrik's suggestions have been implemented. Additionally,
> m3 and m6 are permuted in avx2 before storing to ensure bit by bit
> identical results in avx2.
> libswscale/x86/Makefile | 1 +
> libswscale/x86/swscale.c | 75 +++--------------------
> libswscale/x86/yuv2yuvX.asm | 118 ++++++++++++++++++++++++++++++++++++
> 3 files changed, 129 insertions(+), 65 deletions(-)
> create mode 100644 libswscale/x86/yuv2yuvX.asm
Is this function tested by FATE?
I did some brief testing and apparently it gets called during
fate-filter-shuffleplanes-dup-luma, but the results do not change even
if I comment out the whole function.
Also, it seems like you are adding an AVX2 version of the function, but
I don't see it being used.
--
Anton Khirnov
More information about the ffmpeg-devel
mailing list