[FFmpeg-devel] [PATCH v3] Improved the performance of 1 decode + N filter graphs and adaptive bitrate.

Mon Jan 21 10:19:38 EET 2019

> -----Original Message-----
> From: ffmpeg-devel [mailto:ffmpeg-devel-bounces at ffmpeg.org] On Behalf Of
> Michael Niedermayer
> Sent: Thursday, January 17, 2019 8:30 PM
> To: FFmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
> Cc: Nicolas George <george at nsup.org>
> Subject: Re: [FFmpeg-devel] [PATCH v3] Improved the performance of 1
> decode + N filter graphs and adaptive bitrate.
> 
> On Wed, Jan 16, 2019 at 04:17:07PM -0500, Shaofei Wang wrote:
> > With new option "-abr_pipeline"
> > It enabled multiple filter graph concurrency, which bring obove about
> > 4%~20% improvement in some 1:N scenarios by CPU or GPU acceleration
> >
> > Below are some test cases and comparison as reference.
> > (Hardware platform: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz)
> > (Software: Intel iHD driver - 16.9.00100, CentOS 7)
> >
> > For 1:N transcode by GPU acceleration with vaapi:
> > ./ffmpeg -vaapi_device /dev/dri/renderD128 -hwaccel vaapi \
> >     -hwaccel_output_format vaapi \
> >     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> >     -vf "scale_vaapi=1280:720" -c:v h264_vaapi -f null /dev/null \
> >     -vf "scale_vaapi=720:480" -c:v h264_vaapi -f null /dev/null \
> >     -abr_pipeline
> >
> >     test results:
> >                 2 encoders 5 encoders 10 encoders
> >     Improved       6.1%    6.9%       5.5%
> >
> > For 1:N transcode by GPU acceleration with QSV:
> > ./ffmpeg -hwaccel qsv -c:v h264_qsv \
> >     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> >     -vf "scale_qsv=1280:720:format=nv12" -c:v h264_qsv -f null /dev/null
> \
> >     -vf "scale_qsv=720:480:format=nv12" -c:v h264_qsv -f null
> > /dev/null
> >
> >     test results:
> >                 2 encoders  5 encoders 10 encoders
> >     Improved       6%       4%         15%
> >
> > For Intel GPU acceleration case, 1 decode to N scaling, by QSV:
> > ./ffmpeg -hwaccel qsv -c:v h264_qsv \
> >     -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> >     -vf "scale_qsv=1280:720:format=nv12,hwdownload" -pix_fmt nv12 -f
> null /dev/null \
> >     -vf "scale_qsv=720:480:format=nv12,hwdownload" -pix_fmt nv12 -f
> > null /dev/null
> >
> >     test results:
> >                 2 scale  5 scale   10 scale
> >     Improved       12%     21%        21%
> >
> > For CPU only 1 decode to N scaling:
> > ./ffmpeg -i ~/Videos/1920x1080p_30.00_x264_qp28.h264 \
> >     -vf "scale=1280:720" -pix_fmt nv12 -f null /dev/null \
> >     -vf "scale=720:480" -pix_fmt nv12 -f null /dev/null \
> >     -abr_pipeline
> >
> >     test results:
> >                 2 scale  5 scale   10 scale
> >     Improved       25%    107%       148%
> >
> > Signed-off-by: Wang, Shaofei <shaofei.wang at intel.com>
> > Reviewed-by: Zhao, Jun <jun.zhao at intel.com>
> > ---
> >  fftools/ffmpeg.c        | 228
> ++++++++++++++++++++++++++++++++++++++++++++----
> >  fftools/ffmpeg.h        |  15 ++++
> >  fftools/ffmpeg_filter.c |   4 +
> >  fftools/ffmpeg_opt.c    |   6 +-
> >  4 files changed, 237 insertions(+), 16 deletions(-)
> 
> Looking at this i see alot of duplicated code and alot of ifdefs
Since I didn't want to change the function interface of reap_filters(), a none-loop reap
function generated.
Will change it base on the reap_filters() to avoid duplicated lines in the next patch.

> Preferably one codepath when possible, and best results by default no need to
> manually enable the fast path.
If disable/enable the fast path option is not needed for users, i'll remove it. But before
that, there are some reasons:
1. it provide more choice for user to decide whether to use it depend on their cases, 
otherwise we need to implement the 'strategies' for users to decide when to enable/disable
the fast path.
2. it's easy to compare the result to make sure which is the best

Thanks