[FFmpeg-devel] [PATCH] avcodec/nvenc: Add support for H.265 encoding
Ali KIZIL
alikizil at gmail.com
Thu Mar 26 22:41:46 CET 2015
Philip Langdale <philipl <at> overt.org> writes:
>
> On 2015-03-26 04:30, Ali KIZIL wrote:
> >
> > It works fine now Phil. One more comment:
> >
> > I have a GTX 980. It can encode upto 30-33 fps for 4K 60fps YUV Raw
> > input file using nvenc_h265 avcodec with FFmpeg. First a side, It
> > looked
> > to me like lack of performance of card. However; after I split the
> > video
> > with crop filter into 2:
> >
> > /opt/ffmpeghw/bin/ffmpeg -video_size 3840x2160 -framerate 50 -i
> > /Projects/YUV/soccer.yuv -vcodec nvenc_h265 -an -filter:v
> > "crop=in_w:in_h/2:0:0" -r 50 -g 50 -preset hp -f hevc top.hevc
> >
> > /opt/ffmpeghw/bin/ffmpeg -video_size 3840x2160 -framerate 50 -i
> > /Projects/YUV/soccer.yuv -vcodec nvenc_h265 -an -filter:v
> > "crop=in_w:in_h/2:0:in_h/2" -r 50 -g 50 -preset hp -f hevc
bottom.hevc
> >
> > When I run them at the same time, both can be encoded with 50 fps. I
> > tried to joing output files with padding but FFmpeg needs re-
encoding
> > and it makes no sense.
> >
> > Do you have any comment or idea to use full performance of the card
> > over
> > a single ffmpeg nvenc_h265 instance ?
> >
> > Additional note: GTX cards can suport up to 2 HEVC encoding at the
same
> > time (as limitation.).
>
> I honestly don't know. The hardware performance may not scale linearly
> with
> frame size, so you might see a disproportionate slowdown past a
certain
> size,
> perhaps reflecting the need to use multiple buffers, etc.
>
> Do you see any evidence that you're CPU bound? That might happen if
our
> buffer
> management is too inefficient, but I'd be surprised.
>
> --phil
>
CPU is fine. I have 2 x Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz on
server and Mem Total 49413456 kB, MemFree: 32030320 kB. So, mem is not
an issue also. Here is top output on run:
top - 23:39:18 up 1 day, 21 min, 2 users, load average: 0.08, 0.03,
0.05
Tasks: 371 total, 3 running, 368 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu1 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu2 : 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu3 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu4 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu5 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu6 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu7 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu8 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu9 : 0.0 us, 0.3 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu10 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu11 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu12 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu13 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu14 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu15 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu16 : 29.1 us, 20.3 sy, 0.0 ni, 50.7 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu17 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu18 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu19 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu20 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu21 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu22 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu23 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu24 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu25 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu26 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu27 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu28 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu29 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu30 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
%Cpu31 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si,
0.0 st
KiB Mem: 49413456 total, 19607392 used, 29806064 free, 106188 buffers
KiB Swap: 50282492 total, 0 used, 50282492 free. 16826488 cached
Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
9563 root 20 0 70.432g 2.003g 1.948g R 49.2 4.3 0:08.02
ffmpeg
735 root 20 0 0 0 0 S 0.3 0.0 5:49.83
blackmagic
9600 root 20 0 22240 1844 1112 R 0.3 0.0 0:00.02 top
1 root 20 0 33696 2960 1472 S 0.0 0.0 0:08.37 init
FFmpeg output is:
ffmpeg version N-71096-g2139e58 Copyright (c) 2000-2015 the FFmpeg
developers
built with gcc 4.8 (Ubuntu 4.8.2-19ubuntu1)
configuration: --prefix=/opt/ffmpeghw --extra-cflags=-
I/opt/ffmpeghw/include --extra-ldflags=-L/opt/ffmpeghw/lib --
bindir=/opt/ffmpeghw/bin --extra-libs=-ldl --enable-libx264 --enable-
libx265 --enable-libvpx --enable-libfdk-aac --enable-nonfree --
enable-gpl --enable-nvenc
libavutil 54. 20.101 / 54. 20.101
libavcodec 56. 30.100 / 56. 30.100
libavformat 56. 26.101 / 56. 26.101
libavdevice 56. 4.100 / 56. 4.100
libavfilter 5. 13.101 / 5. 13.101
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 1.100 / 1. 1.100
libpostproc 53. 3.100 / 53. 3.100
[rawvideo @ 0x38051a0] Estimating duration from bitrate, this may be
inaccurate
Input #0, rawvideo, from '/Projects/YUV/soccer.yuv':
Duration: 00:01:53.74, start: 0.000000, bitrate: 681695 kb/s
Stream #0:0: Video: rawvideo, 1 reference frame (I420 / 0x30323449),
yuv420p, 3840x2160, 681672 kb/s, 50 tbr, 50 tbn, 50 tbc
[graph 0 input from stream 0:0 @ 0x38050e0] w:3840 h:2160 pixfmt:yuv420p
tb:1/50 fr:50/1 sar:0/1 sws_param:flags=2
[auto-inserted scaler 0 @ 0x37f1160] w:iw h:ih flags:'0x4' interl:0
[format @ 0x37fa9c0] auto-inserting filter 'auto-inserted scaler 0'
between the filter 'Parsed_null_0' and the filter 'format'
[auto-inserted scaler 0 @ 0x37f1160] w:3840 h:2160 fmt:yuv420p sar:0/1 -
> w:3840 h:2160 fmt:nv12 sar:0/1 flags:0x4
[nvenc_h265 @ 0x3807900] 1 CUDA capable devices found
[nvenc_h265 @ 0x3807900] [ GPU #0 - < GeForce GTX 980 > has Compute SM
5.2, NVENC Available ]
[nvenc_h265 @ 0x3807900] Nvenc initialized successfully
SOME ADDITIONAL CODE FOR DEBUGGING
ctx->init_encode_params.version = -804976880
ctx->init_encode_params.encodeWidth = 3840
ctx->init_encode_params.encodeHeight = 2160
ctx->init_encode_params.darWidth = 3840
ctx->init_encode_params.darHeight = 2160
ctx->init_encode_params.frameRateNum = 50
ctx->init_encode_params.frameRateDen = 1
ctx->init_encode_params.enableEncodeAsync = 0
ctx->init_encode_params.enablePTD = 1
ctx->init_encode_params.reportSliceOffsets = 0
ctx->init_encode_params.enableSubFrameWrite = 0
ctx->init_encode_params.enableExternalMEHints = 0
ctx->init_encode_params.privDataSize = 0
ctx->init_encode_params.enableExternalMEHints = 0
ctx->init_encode_params.maxEncodeWidth = 3840
ctx->init_encode_params.maxEncodeHeight = 2160
ctx->init_encode_params.gopLength = 12
ctx->init_encode_params.frameIntervalP = 1
ctx->init_encode_params.monoChromeEncoding = 0
ctx->init_encode_params.frameFieldMode = 1
ctx->init_encode_params.mvPrecision = 3
encodeConfig.level = 0
encodeConfig.tier = 0
encodeConfig.minCUSize = 2
encodeConfig.maxCUSize = 3
encodeConfig.useConstrainedIntraPred = 0
encodeConfig.disableDeblockAcrossSliceBoundary = 0
encodeConfig.outputBufferingPeriodSEI = 0
encodeConfig.outputPictureTimingSEI = 0
encodeConfig.outputAUD = 0
encodeConfig.enableLTR = 0
encodeConfig.disableSPSPPS = 0
encodeConfig.repeatSPSPPS = 1
encodeConfig.enableIntraRefresh = 0
encodeConfig.idrPeriod = 12
encodeConfig.intraRefreshPeriod = 0
encodeConfig.intraRefreshCnt = 0
encodeConfig.maxNumRefFramesInDPB = 1
encodeConfig.ltrNumFrames = 0
encodeConfig.vpsId = 0
encodeConfig.spsId = 0
encodeConfig.ppsId = 0
encodeConfig.sliceMode = 0
encodeConfig.sliceModeData = 0
encodeConfig.maxTemporalLayersMinus1 = 0
rc_param.constQP = 28
rc_param.averageBitRate = 0
rc_param.maxBitRate = 0
rc_param.vbvBufferSize = 0
[mpegts @ 0x3806820] muxrate VBR, pcr every 5 pkts, sdt every 200,
pat/pmt every 40 pkts
Output #0, mpegts, to 'out.ts':
Metadata:
encoder : Lavf56.26.101
Stream #0:0: Video: hevc (nvenc_h265), 1 reference frame, nv12,
3840x2160, q=-1--1, 50 fps, 90k tbn, 50 tbc
Metadata:
encoder : Lavc56.30.100 nvenc_h265
Stream mapping:
Stream #0:0 -> #0:0 (rawvideo (native) -> hevc (nvenc_h265))
Press [q] to stop, [?] for help
frame= 765 fps= 29 q=0.0 Lsize= 137360kB time=00:00:15.30
bitrate=73545.9kbits/s
video:127337kB audio:0kB subtitle:0kB other streams:0kB global
headers:0kB muxing overhead: 7.871264%
Input file #0 (/Projects/YUV/soccer.yuv):
Input stream #0:0 (video): 765 packets read (9517824000 bytes); 765
frames decoded;
Total: 765 packets (9517824000 bytes) demuxed
Output file #0 (out.ts):
Output stream #0:0 (video): 765 frames encoded; 765 packets muxed
(130392951 bytes);
Total: 765 packets (130392951 bytes) muxed
[nvenc_h265 @ 0x3807900] Nvenc unloaded
I think you are right, performance is not going linear with FPS + Video
Size. In a few days, I will be able to test with a higher GM2xx card. I
will let you know.
More information about the ffmpeg-devel
mailing list