[FFmpeg-devel] have some major changes for nvenc support
Agatha Hu
ahu at nvidia.com
Thu Nov 5 09:23:04 CET 2015
Hi,
Recently Nvidia did some work on improving nvenc performance, it
includes lots of change so I attach the patch instead of direct send.
Here are the explanations:
1) The first main change is adding an nvresize filter (1:N, one input,
multiple outputs) to do hardware resizing, because during our interal
1:N encoding test, we found swscale becomes bottleneck. So we use cuda
kernel instead.
2) We use AVFrame::opaque field to store a customized ffnvinfo struture
to prevent expensive CPU<->GPU transferration. Without it, the workflow
will be like CPU AVFrame input-->copy to GPU-->do CUDA resizing-->copy
to CPU AVFrame-->copy to GPU-->NVENC encoding. And now it becomes:
CPU AVFrame input-->copy to GPU-->do CUDA resizing-->NVENC encoding.
Our strategy is to check whether AVFrame::opaque is not null AND its
first 128 bytes matches some particular GUID. If so, AVFrame::opaque is
a valid ffnvinfo struture and we read GPU address directly from it
instead of copying data from AVFrame.
Nvresize filter has a -readback parameter, if it's set as 0, resized
result won't be copied back to CPU, mostly in case it's connected to an
NVENC encoder。 If it's set as 1, resized result will still be copied
back to AVFrame so that it could be compatible with other components.
3) Because we are using CUDA address now, input buffer becomes CUDA
external memory. We replaced NvEncCreateInputBuffer to
cuMemAllocPitch+NvEncRegisterInputBuffer, and
NvEncLock/UnlockInputBuffer to NvEncMap/UnmapInputBuffer.
4) And because of using cuda input, it exposed some driver bugs, e.g.
nvenc generates corrupted chroma plane data if buffer format is YUV420p.
Bug-fixed driver will soon be released, but considering backwards
compatibility we decided to convert YUV420P to NV12 explicitly by a cuda
kernel in nvenc.c. Even in the bug-fixed driver, there's still a
YUV420P->NV12 conversion kernel. The only difference is that kernel is
provided along with driver, but here we did it within nvenc.c.
The same reason, YUV444P is removed temporarily, there's a bug for cuda
input. Once the fix is released, we should enable the support again.
We choose to backwards support YUV420p is because it's much more popular
than YUV444P.
5) Last is, we move most of cuda typedefs/functions/helpers to cudautils.h/c
A typical use case is:
ffmpeg -y -i $1 $2 $3 -filter_complex \
nvresize=5:s=hd1080\|hd720\|hd480\|wvga\|cif:readback=0[out0][out1][out2][out3][out4]
\
-map [out0] -an -vcodec nvenc_h264 -preset slow -profile:v main
-async 1 -b:v 200M -bufsize 200M -maxrate 200M -refs 1 -bf 2 $1_1080p.mp4 \
-map [out1] -an -vcodec nvenc_h264 -preset slow -profile:v main
-async 1 -b:v 100M -bufsize 100M -maxrate 100M -refs 1 -bf 2 $1_720p.mp4 \
-map [out2] -an -vcodec nvenc_h264 -preset slow -profile:v main
-async 1 -b:v 50M -bufsize 50M -maxrate 50M -refs 1 -bf 2 $1_480p.mp4 \
-map [out3] -an -vcodec nvenc_h264 -preset slow -profile:v main
-async 1 -b:v 25M -bufsize 25M -maxrate 25M -refs 1 -bf 2 $1_wvga.mp4 \
-map [out4] -an -vcodec nvenc_h264 -preset slow -profile:v main
-async 1 -b:v 10M -bufsize 10M -maxrate 10M -refs 1 -bf 2 $1_cif.mp4
Thanks
Agatha Hu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-combined-cuda-resize-yuv420-fix-remove-yuv444-add-AQ_v6.0.patch
Type: text/x-patch
Size: 108910 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151105/2f9dde51/attachment.bin>
More information about the ffmpeg-devel
mailing list