[FFmpeg-devel] [PATCH] avfilter: add vf_overlay_cuda

Alex 3.14pi at ukr.net
Wed Apr 1 16:43:30 EEST 2020


Hi!Is it working? I try everything but constantly get error from overlay_cuda:


ffmpeg -y -init_hw_device cuda=cuda -filter_hw_device cuda -hwaccel cuvid -c:v h264_cuvid -resize 1920x1080 -i 720p.mp4 -i watermark.png -filter_complex "[1:v]format=nv12,hwupload[img];[0:v][img]overlay_cuda=x=50:y=800[out]" -map [out] -c:v h264_nvenc -b:v 6M -an -preset fast  -y out_nvenc_overlay.mp4
...
ffmpeg version git-2020-04-01-afa5e38
...
[h264_cuvid @ 000001dd1b356d00] CUVID capabilities for h264_cuvid:
[h264_cuvid @ 000001dd1b356d00] 8 bit: supported: 1, min_width: 48, max_width: 4096, min_height: 16, max_height: 4096
[h264_cuvid @ 000001dd1b356d00] 10 bit: supported: 0, min_width: 0, max_width: 0, min_height: 0, max_height: 0
[h264_cuvid @ 000001dd1b356d00] 12 bit: supported: 0, min_width: 0, max_width: 0, min_height: 0, max_height: 0
Stream mapping:
  Stream #0:0 (h264_cuvid) -> overlay_cuda:main
  Stream #1:0 (png) -> format
  overlay_cuda -> Stream #0:0 (h264_nvenc)
Press [q] to stop, [?] for help
[h264_cuvid @ 000001dd1b356d00] Formats: Original: cuda | HW: cuda | SW: nv12
[graph 0 input from stream 1:0 @ 000001dd2e84a100] w:1894 h:302 pixfmt:rgba tb:1/25 fr:25/1 sar:11811/11811
[graph 0 input from stream 0:0 @ 000001dd2e84ae00] w:1920 h:1080 pixfmt:cuda tb:1/24000 fr:24000/1001 sar:1/1
[auto_scaler_0 @ 000001dd2ebf4cc0] w:iw h:ih flags:'bilinear' interl:0
[Parsed_format_0 @ 000001dd2e849780] auto-inserting filter 'auto_scaler_0' between the filter 'graph 0 input from stream 1:0' and the filter 'Parsed_format_0'
[auto_scaler_0 @ 000001dd2ebf4cc0] w:1894 h:302 fmt:rgba sar:11811/11811 -> w:1894 h:302 fmt:nv12 sar:1/1 flags:0x2
[overlay_cuda @ 000001dd2ebc87c0] cu->cuModuleLoadData(&ctx->cu_module, vf_overlay_cuda_ptx) failed -> CUDA_ERROR_INVALID_IMAGE: device kernel image is invalid
[Parsed_overlay_cuda_2 @ 000001dd2e84b6c0] Failed to configure output pad on Parsed_overlay_cuda_2
Error reinitializing filters!
Failed to inject frame into filter network: Generic error in an external library
Error while processing the decoded data for stream #0:0
...



--- Original message ---
From: "Yaroslav Pogrebnyak" <yyyaroslav at gmail.com>
Date: 18 March 2020, 09:29:15

This patch adds 'vf_overlay_cuda' filter. 
It draws one picture on top of another on CUDA GPU. 
For the end-user, it's similar to 'vf_overlay_opencl' and other overlay filters. 

This filter would be especially useful for building video processing pipelines that execute fully on the CUDA GPU. For example, the following pipeline would be possible: decode -> scale -> overlay -> encode, without copying frames between CPU and GPU in between.

Technical details.

Supported sw input formats are NV12 and YUV420P for main input, and NV12, YUV420P and YUVA420P for overlay input. 
Main and overlay sw formats should match (i.e, overlaying YUVA420P on NV12 is not implemented). 
All pixel format conversions are needed to be done with 'format' or 'scale_npp' filters before 'overlay_cuda'.

It was needed to slightly modify 'hwcontext_cuda.c' to allow overlays with alpha channel:
 - Allow AV_PIX_FMT_YUVA420P to enable hwuploading frames with alpha channel to GPU.
 - Do not shift Height of 4rd plane (alpha) when uploading to GPU.

Examples.

- Overlay picture on top of video (main: YUVJ420P->NV12, overlay: NV12)
$ ffmpeg -y -init_hw_device cuda=cuda -filter_hw_device cuda -hwaccel cuvid \
  -c:v h264_cuvid -i main.mp4 \
  -i ~/overlay.jpg \
  -filter_complex "[1:v]format=nv12, hwupload[overlay], [0:v][overlay]overlay_cuda=x=0:y=0:shortest=false" \
  -an -c:v h264_nvenc -b:v 5M output.mp4

- Overlay one video on top of another (main: NV12, overlay: NV12)
$ ffmpeg -y \
  -hwaccel cuvid -c:v h264_cuvid -i main.mp4 \
  -hwaccel cuvid -c:v h264_cuvid -i overlay.mp4 \
  -filter_complex "[1:v]scale_npp=512:-1[o], [v:0][o]overlay_cuda=x=100:y=100:shortest=true" \
  -an -c:v h264_nvenc -b:v 5M output.mp4

- Overlay picture with alpha channel on top of video (main: NV12->YUV420P, overlay: RGBA->YUVA420P)
$ ffmpeg -y \
  -init_hw_device cuda=cuda -filter_hw_device cuda -hwaccel cuvid \
  -c:v h264_cuvid -i ~/main.mp4 \
  -i ~/overlay.png \
  -filter_complex "[1:v]format=yuva420p, hwupload[o], [v:0]scale_npp=format=yuv420p[m], [m][o]overlay_cuda=x=0:y=0:shortest=false" \
  -an -c:v h264_nvenc -b:v 5M output.mp4

Patch attached.

P.S. This is my first patch, I would be grateful for any feedback to know if I'm doing things correctly or not.
Thanks!


Signed-off-by: Yaroslav Pogrebnyak <yyyaroslav at gmail.com>
---
 configure                      |   2 +
 libavfilter/Makefile           |   1 +
 libavfilter/allfilters.c       |   1 +
 libavfilter/vf_overlay_cuda.c  | 451 +++++++++++++++++++++++++++++++++
 libavfilter/vf_overlay_cuda.cu |  54 ++++
 libavutil/hwcontext_cuda.c     |   3 +-
 6 files changed, 511 insertions(+), 1 deletion(-)
 create mode 100644 libavfilter/vf_overlay_cuda.c
 create mode 100644 libavfilter/vf_overlay_cuda.cu


_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel at ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".


More information about the ffmpeg-devel mailing list