[FFmpeg-devel] [PATCH v3 0/3] swscale: add AVX2 version of yuv2nv12cX
Nelson Gomez
negomez at linux.microsoft.com
Sun Apr 26 05:37:00 EEST 2020
From: Nelson Gomez <nelson.gomez at microsoft.com>
v3:
- Fixed x86_32 compilation
v2: [2]
- Addressing comments James left on iter. 1
- Cleaned up how dither gets read to avoid using stack space
v1: [1]
[1] http://ffmpeg.org/pipermail/ffmpeg-devel/2020-April/261313.html
[2] http://ffmpeg.org/pipermail/ffmpeg-devel/2020-April/261346.html
This patchset aims to optimize yuv2nv12cX_c for Intel/AMD chips by adding an
AVX2 implementation of it. To support this change, the typedef declaration for
yuv2interleavedX_fn has been changed to pass two additional parameters for
chrDither8 and dstFormat rather than passing a pointer to the entire SwsContext.
Output is bit-identical to the software implementation.
Patchset validated on an Intel Xeon W-2133, Core i7-8650U, and an AMD Ryzen
1700. Passes fate tests; this patch is exercised by
fate-filter-pixdesc-nv{12,21,24,42}.
Benchmarks measured on the W-2133. Flags used are:
-benchmark -i /dev/shm/benchmark.mp4 -pix_fmt nv42 -f null -
Benchmark material is a yuv420p file:
http://linux.microsoft.com/~negomez/ffmpeg/yuv420p-benchmark.mp4
Results:
* Single-threaded conversion: +95% fps
-cpuflags -avx2 -threads 1:
frame= 9959 fps=114 q=-0.0 Lsize=N/A time=00:05:32.29 bitrate=N/A speed=3.79x
bench: utime=87.648s stime=0.060s rtime=87.709s
bench: maxrss=35020kB
-cpuflags all -threads 1:
frame= 9959 fps=222 q=-0.0 Lsize=N/A time=00:05:32.29 bitrate=N/A speed=7.39x
bench: utime=44.900s stime=0.040s rtime=44.941s
bench: maxrss=33048kB
* Multi-threaded conversion: +197% fps
-cpuflags -avx2:
frame= 9959 fps=159 q=-0.0 Lsize=N/A time=00:05:32.29 bitrate=N/A speed=5.3x
bench: utime=90.381s stime=0.430s rtime=62.663s
bench: maxrss=77420kB
-cpuflags all:
frame= 9959 fps=473 q=-0.0 Lsize=N/A time=00:05:32.29 bitrate=N/A speed=15.8x
bench: utime=48.625s stime=0.459s rtime=21.058s
bench: maxrss=78500kB
Nelson Gomez (3):
swscale: make yuv2interleavedX more asm-friendly
swscale/x86/output: add AVX2 version of yuv2nv12cX
swscale: cosmetic fixes
libswscale/output.c | 19 ++---
libswscale/swscale_internal.h | 6 +-
libswscale/vscale.c | 2 +-
libswscale/x86/output.asm | 126 +++++++++++++++++++++++++++++++++-
libswscale/x86/swscale.c | 28 ++++++++
5 files changed, 168 insertions(+), 13 deletions(-)
--
2.25.1
More information about the ffmpeg-devel
mailing list