[FFmpeg-devel] [PATCH v2 1/2] avfilter: add scale_d3d11 filter

Tue Jun 24 10:56:30 EEST 2025

On Tue, Jun 24, 2025 at 9:47 AM Dash Santosh Sathyanarayanan
<dash.sathyanarayanan at multicorewareinc.com> wrote:
>
> This commit introduces a new hardware-accelerated video filter, scale_d3d11,
> which performs scaling and format conversion using Direct3D 11. The filter enables
> efficient GPU-based scaling and pixel format conversion (p010 to nv12), reducing
> CPU overhead and latency in video pipelines.
>
> > +        if (frames_ctx->sw_format == AV_PIX_FMT_NV12) {
> > +            frames_hwctx->BindFlags |= D3D11_BIND_VIDEO_ENCODER;
> > +        }
> >
> > The filter should be designed universally rather than expecting to be connected
> > to something specific at its output. Whether or not the bind_encoder bind flag
> > should be set, needs to be determined during negotiation of the output
> > connection.
>
> The above change is in dxva2.c, when the decoder output surface is passed directly
> to supported encoder, it requires BIND_VIDEO_ENCODER flag to be set. This does not
> affect any intermediate steps. A surface with encoder bindflags set can be used as
> input for any supported filters as well (confirmed with the scale_d3d11 filter).
>

This part is still not acceptable. You blindly set a flag based on
what you need, if special flags on the context are needed they should
be negotiated between the different components, guided by the users
use-case and input, and not hardcoded based on an arbitrary condition
of a pixel format.

> diff --git a/libavutil/hwcontext_d3d11va.c b/libavutil/hwcontext_d3d11va.c
> index 1a047ce57b..7e122d607f 100644
> --- a/libavutil/hwcontext_d3d11va.c
> +++ b/libavutil/hwcontext_d3d11va.c
> @@ -340,19 +340,30 @@ static int d3d11va_get_buffer(AVHWFramesContext *ctx, AVFrame *frame)
>  {
>      AVD3D11FrameDescriptor *desc;
>
> -    frame->buf[0] = av_buffer_pool_get(ctx->pool);
> -    if (!frame->buf[0])
> -        return AVERROR(ENOMEM);
> -
> -    desc = (AVD3D11FrameDescriptor *)frame->buf[0]->data;
> +    /**
> +     * Loop until a buffer becomes available from the pool.
> +     * In a full hardware pipeline, all buffers may be temporarily in use by
> +     * other modules (encoder/filter/decoder). Rather than immediately failing
> +     * with ENOMEM, we wait for a buffer to be released back to the pool, which
> +     * maintains pipeline flow and prevents unnecessary allocation failures
> +     * during normal operation.
> +     */
> +    do {
> +        frame->buf[0] = av_buffer_pool_get(ctx->pool);
> +        if (frame->buf[0]) {
> +            desc = (AVD3D11FrameDescriptor *)frame->buf[0]->data;
> +            frame->data[0] = (uint8_t *)desc->texture;
> +            frame->data[1] = (uint8_t *)desc->index;
> +            frame->format  = AV_PIX_FMT_D3D11;
> +            frame->width   = ctx->width;
> +            frame->height  = ctx->height;
> +            return 0;
> +        }
>
> -    frame->data[0] = (uint8_t *)desc->texture;
> -    frame->data[1] = (uint8_t *)desc->index;
> -    frame->format  = AV_PIX_FMT_D3D11;
> -    frame->width   = ctx->width;
> -    frame->height  = ctx->height;
> +        av_usleep(500);
> +    } while (1);
>
> -    return 0;
> +    return AVERROR(ENOMEM);
>  }
>

A potentially infinite loop in get_buffer is an absolute no-go. There
is no way to control this behavior, and the surface may never become
available if the caller is blocked in a loop here. You assume a
certain threading model which may not exist.
This sort of behavior should be handled by the caller, not here (and
no, a maximum wait time or anything like that does not resolve this
concern, move it to another layer where the caller can control this).

- Hendrik