[FFmpeg-devel] [PATCH 0/6] x86 SIMD for dirac 10-bit wavelet transforms
Rostislav Pehlivanov
atomnuker at gmail.com
Thu Jul 19 18:23:26 EEST 2018
On 19 July 2018 at 15:52, James Darnley <jdarnley at obe.tv> wrote:
> I tested the speed gains by using ffmpeg to decode a 720p yuv422p10 file
> encoded
> with the relevant transform. The summary is below.
>
> Haar
> C: 119fps
> SSE2: 204fps
> AVX: 206fps
> AVX2: 221fps
>
> 5_3
> C: 94fps
> SSE2: 118fps
> AVX2: 121fps
>
> 9_7
> C: 84fps
> SSE2: 111fps
> AVX2: 115fps
>
> Is the AVX worth it in Haar? Is the AVX2 worth it in the latter two? I
> added
> those later which is why they are separate patches. I will squash them
> before
> pushing if I keep them.
>
> James Darnley (6):
> diracdec: add 10-bit Haar SIMD functions
> diracdec: add 10-bit Legall 5,3 (5_3) SIMD functions
> diracdec: add 10-bit Deslauriers-Dubuc 9,7 (9_7) vertical high-pass
> function
> diracdec: avx2 legall
> diracdec: avx2 dd97
> diracdec: increase rodata alignment for avx2
>
> libavcodec/dirac_dwt.c | 7 +-
> libavcodec/dirac_dwt.h | 1 +
> libavcodec/x86/Makefile | 6 +-
> libavcodec/x86/dirac_dwt_10bit.asm | 209 +++++++++++++++++++++++++
> libavcodec/x86/dirac_dwt_init_10bit.c | 210 ++++++++++++++++++++++++++
> 5 files changed, 430 insertions(+), 3 deletions(-)
> create mode 100644 libavcodec/x86/dirac_dwt_10bit.asm
> create mode 100644 libavcodec/x86/dirac_dwt_init_10bit.c
>
> --
> 2.17.1
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
Could you provide standard overall transform results using START/STOP_TIMER
rather than overall decoding speed?
Coefficients sizes and therefore golomb unpacking speed changes with
respect to the transform so potentially there could be somewhat of a
bottleneck on decoding before the inverse transform.
More information about the ffmpeg-devel
mailing list