[FFmpeg-devel] [PATCH 0/6] x86 SIMD for dirac 10-bit wavelet transforms
James Darnley
jdarnley at obe.tv
Thu Jul 19 17:52:46 EEST 2018
I tested the speed gains by using ffmpeg to decode a 720p yuv422p10 file encoded
with the relevant transform. The summary is below.
Haar
C: 119fps
SSE2: 204fps
AVX: 206fps
AVX2: 221fps
5_3
C: 94fps
SSE2: 118fps
AVX2: 121fps
9_7
C: 84fps
SSE2: 111fps
AVX2: 115fps
Is the AVX worth it in Haar? Is the AVX2 worth it in the latter two? I added
those later which is why they are separate patches. I will squash them before
pushing if I keep them.
James Darnley (6):
diracdec: add 10-bit Haar SIMD functions
diracdec: add 10-bit Legall 5,3 (5_3) SIMD functions
diracdec: add 10-bit Deslauriers-Dubuc 9,7 (9_7) vertical high-pass
function
diracdec: avx2 legall
diracdec: avx2 dd97
diracdec: increase rodata alignment for avx2
libavcodec/dirac_dwt.c | 7 +-
libavcodec/dirac_dwt.h | 1 +
libavcodec/x86/Makefile | 6 +-
libavcodec/x86/dirac_dwt_10bit.asm | 209 +++++++++++++++++++++++++
libavcodec/x86/dirac_dwt_init_10bit.c | 210 ++++++++++++++++++++++++++
5 files changed, 430 insertions(+), 3 deletions(-)
create mode 100644 libavcodec/x86/dirac_dwt_10bit.asm
create mode 100644 libavcodec/x86/dirac_dwt_init_10bit.c
--
2.17.1
More information about the ffmpeg-devel
mailing list