[FFmpeg-devel] [PATCH] avcodec/nvdec: support resizing while decoding

Fri Sep 20 02:16:16 EEST 2024

Hi!

This is my first contribution to the project so please excuse any bad
etiquette, I tried to read all the FAQs before posting. Would love to start
by thanking everyone for such an amazing framework you've built!

Anyway, here's my proposed patch to support video resizing when using NVDEC
hwaccel to decode hevc video (I could look into a similar patch for h264,
av1, etc if this looks useful). There's a bit more context/explanation in
the commit description in the patch, but please let me know if the use case
isn't clear.

I tested locally and all of these 4 scenarios work as expected:
 * Using hevc codec with nvdec hwaccel, leaving avctx->width and
avctx->height unset. On a 1920x1080 input video, I get 1920x1080 cuda
frames out.
 * Using hevc codec with nvdec hwaccel, setting avctx->width and
avctx->height to some arbitrary value (e.g. 640x360). On the same input
video, I get 640x360 cuda frames out.
 * Using hevc codec without hwaccel, leaving avctx->width and avctx->height
unset. I get 1920x1080 yuvj420p frames (in cpu) out.
 * Using hevc codec without hwaccel, setting avctx->width and avctx->height
to some arbitrary value (e.g. 640x360). The values get ignored (as in
FFMpeg master) and I again get 1920x1080 yuvj420p frames out.

I'm not extremely familiar with hevcdec.c so I'm not sure if this would
accidentally break something else. Looking forward to hearing your thoughts!


>From 850afda5f6479064c75a4b905f12e48f97b6d551 Mon Sep 17 00:00:00 2001
From: Carlos Ruiz <carlos.r.domin at gmail.com>
Date: Thu, 19 Sep 2024 14:00:05 +0200
Subject: [PATCH] avcodec/nvdec: support resizing while decoding

Nvidia chips support accelerated resizing while decoding video. The *_cuvid
codecs (cuviddec.c) already support resizing and cropping, but have two big
downsides:
  1) they have a minimum latency of two packets (even with the LOW_DELAY
flag enabled)
  2) AV_CODEC_FLAG_COPY_OPAQUE is not respected (opaque and opaque_ref
aren't transferred from packets to frames)

Instead, parsing the video using a non-accelerated codec (hevcdec.c) solves
both downsides above. This commit brings resizing capabilities to the
*_nvdec hwaccel, similar to what *_cuvid does, to combine the best of both
worlds (proper parsing + accelerated decoding and resizing).
---
 libavcodec/hevc/hevcdec.c |  8 ++++++--
 libavcodec/nvdec.c        | 21 +++++++++++++++++----
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/libavcodec/hevc/hevcdec.c b/libavcodec/hevc/hevcdec.c
index d915d74d22..d63fc5875f 100644
--- a/libavcodec/hevc/hevcdec.c
+++ b/libavcodec/hevc/hevcdec.c
@@ -351,8 +351,12 @@ static void export_stream_params(HEVCContext *s, const
HEVCSPS *sps)
     avctx->pix_fmt             = sps->pix_fmt;
     avctx->coded_width         = sps->width;
     avctx->coded_height        = sps->height;
-    avctx->width               = sps->width  - ow->left_offset -
ow->right_offset;
-    avctx->height              = sps->height - ow->top_offset  -
ow->bottom_offset;
+    if (avctx->width <= 0 || avctx->height <= 0) {
+        avctx->width           = sps->width;
+        avctx->height          = sps->height;
+    }
+    avctx->width               = avctx->width - ow->left_offset -
ow->right_offset;
+    avctx->height              = avctx->height - ow->top_offset  -
ow->bottom_offset;
     avctx->has_b_frames        = sps->temporal_layer[sps->max_sub_layers -
1].num_reorder_pics;
     avctx->profile             = sps->ptl.general_ptl.profile_idc;
     avctx->level               = sps->ptl.general_ptl.level_idc;
diff --git a/libavcodec/nvdec.c b/libavcodec/nvdec.c
index 932544564a..86143de74c 100644
--- a/libavcodec/nvdec.c
+++ b/libavcodec/nvdec.c
@@ -324,6 +324,18 @@ static int nvdec_init_hwframes(AVCodecContext *avctx,
AVBufferRef **out_frames_r
     return 0;
 }

+static int get_buffer2(AVCodecContext *avctx, AVFrame *frame, int flags) {
+    /*
+     * HEVC codec includes FF_CODEC_CAP_EXPORTS_CROPPING in its
caps_internal, so by default frames will be set
+     * to width=avctx->coded_width and height=avctx->coded_height. Now
that we support resizing as part of decoding,
+     * overwrite the frame dimensions with display values rather than
coded.
+     */
+    int ret = avcodec_default_get_buffer2(avctx, frame, flags);
+    frame->width = avctx->width;
+    frame->height = avctx->height;
+    return ret;
+}
+
 int ff_nvdec_decode_init(AVCodecContext *avctx)
 {
     NVDECContext *ctx = avctx->internal->hwaccel_priv_data;
@@ -393,8 +405,9 @@ int ff_nvdec_decode_init(AVCodecContext *avctx)

     params.ulWidth             = avctx->coded_width;
     params.ulHeight            = avctx->coded_height;
-    params.ulTargetWidth       = avctx->coded_width;
-    params.ulTargetHeight      = avctx->coded_height;
+    avctx->get_buffer2         = get_buffer2;
+    params.ulTargetWidth       = avctx->width;
+    params.ulTargetHeight      = avctx->height;
     params.bitDepthMinus8      = sw_desc->comp[0].depth - 8;
     params.OutputFormat        = output_format;
     params.CodecType           = cuvid_codec_type;
@@ -719,8 +732,8 @@ int ff_nvdec_frame_params(AVCodecContext *avctx,
     chroma_444 = supports_444 && cuvid_chroma_format ==
cudaVideoChromaFormat_444;

     frames_ctx->format            = AV_PIX_FMT_CUDA;
-    frames_ctx->width             = (avctx->coded_width + 1) & ~1;
-    frames_ctx->height            = (avctx->coded_height + 1) & ~1;
+    frames_ctx->width             = (avctx->width + 1) & ~1;
+    frames_ctx->height            = (avctx->height + 1) & ~1;
     /*
      * We add two extra frames to the pool to account for deinterlacing
filters
      * holding onto their frames.
--
2.43.0