[FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup. AVX2 version is ready and tested, although local tests show a significant speed-up in this function using avx2, swscale code slows down overall probably due cpu frequency scaling.
Michael Niedermayer
michael at niedermayer.cc
Sat Oct 24 15:20:08 EEST 2020
On Fri, Oct 23, 2020 at 03:34:18PM +0200, Alan Kelly wrote:
> Fixed. The wrong step size was used causing a write passed the end of
> the buffer. yuv2yuvX_mmxext is now called if there are any remaining
> pixels.
>
> There is currently no checkasm for these functions. Is this required for
> submission?
>
> (Apologies for the double mail, I used git send-email but it didn't
> respond to the correct thread)
> ---
> libswscale/x86/Makefile | 1 +
> libswscale/x86/swscale.c | 75 ++++----------------------
> libswscale/x86/yuv2yuvX.asm | 105 ++++++++++++++++++++++++++++++++++++
> 3 files changed, 116 insertions(+), 65 deletions(-)
> create mode 100644 libswscale/x86/yuv2yuvX.asm
error: corrupt patch at line 18
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
The worst form of inequality is to try to make unequal things equal.
-- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20201024/645b837e/attachment.sig>
More information about the ffmpeg-devel
mailing list