[FFmpeg-devel] [PATCH] lavu: add AVVideoHint API

Mon Jul 24 02:27:27 EEST 2023

On date Tuesday 2023-07-18 00:19:40 +0200, Stefano Sabatini wrote:
> On date Monday 2023-07-10 12:51:25 +0000, Carotti, Elias wrote:
> > On Mon, 2023-07-10 at 08:13 +0000, Carotti, Elias wrote:
> > > 
> > 
> > > AVVideoHint is a bad name for something like this.
> > > Could you borrow some wording from graphics and call it
> > > AVVideoDamagedHint or maybe AVVideoChangedAreaHint or a combination
> > > of both?
> > > I'd prefer the former, damage is standard language in graphics
> > > circles about what has changed since the last frame.
> > > 
> > > Hi,
> > > I have no strong objections on this. Personally I also like the
> > > AVVideoDamagedHint name best, my only concern is that it is strictly
> > > related to the current use/implementation 
> > > (it's true right now that's the only kind of hint) while it may turn
> > > out to be a bad naming decision should other forms of hinting for the
> > > encoder be added in the future.
> > > That said, I am fine with the change too.
> 
> AVVideoDamagedHint would be too specific (what if we want to add
> "hinting" for different things?).
> 
> > > Elias
> > > 
> > 
> > I added a type to the AVVideoRect struct. This should solve the naming
> > issue above while preserving the possibility to extend this to
> > different hinting types.
> > These are the only changes to Anton's version.
> > Best
> > Elias
> > 
> >  
> > 
> > 
> > 
> > 
> > 
> > NICE SRL, viale Monte Grappa 3/5, 20124 Milano, Italia, Registro delle Imprese di Milano Monza Brianza Lodi REA n. 2096882, Capitale Sociale: 10.329,14 EUR i.v., Cod. Fisc. e P.IVA 01133050052, Societa con Socio Unico
> > 
> > 
> 
> > From 8ef4f97410a6b78df048b71d9921a763da6255b3 Mon Sep 17 00:00:00 2001
> > From: Elias Carotti <eliascrt _at_ amazon _dot_ it>
> > Date: Mon, 10 Jul 2023 14:34:53 +0200
> > Subject: [PATCH] Add side data type to provide hint to the video encoders
> >  about unchanged portions of each frame.
> > 
> > Signed-off-by: Anton Khirnov <anton at khirnov.net>
> > ---
> >  doc/APIchanges         |   3 ++
> >  libavutil/Makefile     |   2 +
> >  libavutil/frame.h      |  10 ++++
> >  libavutil/version.h    |   2 +-
> >  libavutil/video_hint.c |  82 +++++++++++++++++++++++++++++
> >  libavutil/video_hint.h | 117 +++++++++++++++++++++++++++++++++++++++++
> >  6 files changed, 215 insertions(+), 1 deletion(-)
> >  create mode 100644 libavutil/video_hint.c
> >  create mode 100644 libavutil/video_hint.h
> > 
> > diff --git a/doc/APIchanges b/doc/APIchanges
> > index 27f835cfce..0cda51fdee 100644
> > --- a/doc/APIchanges
> > +++ b/doc/APIchanges
> > @@ -2,6 +2,9 @@ The last version increases of all libraries were on 2023-02-09
> >  
> >  API changes, most recent first:
> >  
> > +2023-07-xx - xxxxxxxxxx - lavu 58.15.100 - video_hint.h
> > +  Add AVVideoHint API.
> > +
> >  2023-07-05 - xxxxxxxxxx - lavu 58.14.100 - random_seed.h
> >    Add av_random_bytes()
> >  
> > diff --git a/libavutil/Makefile b/libavutil/Makefile
> > index bd9c6f9e32..7828c94dc5 100644
> > --- a/libavutil/Makefile
> > +++ b/libavutil/Makefile
> > @@ -91,6 +91,7 @@ HEADERS = adler32.h                                                     \
> >            tea.h                                                         \
> >            tx.h                                                          \
> >            film_grain_params.h                                           \
> > +          video_hint.h
> >  
> >  ARCH_HEADERS = bswap.h                                                  \
> >                 intmath.h                                                \
> > @@ -181,6 +182,7 @@ OBJS = adler32.o                                                        \
> >         uuid.o                                                           \
> >         version.o                                                        \
> >         video_enc_params.o                                               \
> > +       video_hint.o                                                     \
> >         film_grain_params.o                                              \
> >  
> >  
> > diff --git a/libavutil/frame.h b/libavutil/frame.h
> > index a491315f25..c0c1b23db7 100644
> > --- a/libavutil/frame.h
> > +++ b/libavutil/frame.h
> > @@ -214,6 +214,16 @@ enum AVFrameSideDataType {
> >       * Ambient viewing environment metadata, as defined by H.274.
> >       */
> >      AV_FRAME_DATA_AMBIENT_VIEWING_ENVIRONMENT,
> > +
> > +    /**
> > +     * Provide encoder-specific hinting information about changed/unchanged
> > +     * portions of a frame.  It can be used to pass information about which
> > +     * macroblocks can be skipped because they didn't change from the
> > +     * corresponding ones in the previous frame. This could be useful for
> > +     * applications which know this information in advance to speed up
> > +     * encoding.
> > +     */
> > +    AV_FRAME_DATA_VIDEO_HINT,
> >  };
> >  
> >  enum AVActiveFormatDescription {
> > diff --git a/libavutil/version.h b/libavutil/version.h
> > index 24af520e08..9e798b0e3f 100644
> > --- a/libavutil/version.h
> > +++ b/libavutil/version.h
> > @@ -79,7 +79,7 @@
> >   */
> >  
> >  #define LIBAVUTIL_VERSION_MAJOR  58
> > -#define LIBAVUTIL_VERSION_MINOR  14
> > +#define LIBAVUTIL_VERSION_MINOR  15
> >  #define LIBAVUTIL_VERSION_MICRO 100
> >  
> >  #define LIBAVUTIL_VERSION_INT   AV_VERSION_INT(LIBAVUTIL_VERSION_MAJOR, \
> > diff --git a/libavutil/video_hint.c b/libavutil/video_hint.c
> > new file mode 100644
> > index 0000000000..5730ed6cfb
> > --- /dev/null
> > +++ b/libavutil/video_hint.c
> > @@ -0,0 +1,82 @@
> > +/*
> > + * Copyright 2023 Elias Carotti <eliascrt at amazon dot it>
> > + *
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> > + */
> > +
> > +#include <string.h>
> > +
> > +#include "avstring.h"
> > +#include "frame.h"
> > +#include "macros.h"
> > +#include "mem.h"
> > +#include "video_hint.h"
> > +
> > +AVVideoHint *av_video_hint_alloc(size_t nb_rects,
> > +                                 size_t *out_size)
> > +{
> > +    struct TestStruct {
> > +        AVVideoHint    hint;
> > +        AVVideoRect    rect;
> > +    };
> > +    const size_t rect_offset = offsetof(struct TestStruct, rect);
> > +    size_t size = rect_offset;
> > +    AVVideoHint *hint;
> > +
> > +    *out_size = 0;
> > +    if (nb_rects > (SIZE_MAX - size) / sizeof(AVVideoRect))
> > +        return NULL;
> > +    size += sizeof(AVVideoRect) * nb_rects;
> > +
> > +    hint = av_mallocz(size);
> > +    if (!hint)
> > +        return NULL;
> > +
> > +    hint->nb_rects    = nb_rects;
> > +    hint->rect_offset = rect_offset;
> > +    hint->rect_size   = sizeof(AVVideoRect);
> > +
> > +    *out_size = size;
> > +
> > +    return hint;
> > +}
> > +
> > +AVVideoHint *av_video_hint_create_side_data(AVFrame *frame,
> > +                                            size_t nb_rects)
> > +{
> > +    AVVideoHint *hint;
> > +    AVBufferRef *buf;
> > +    size_t size = 0;
> > +
> > +    hint = av_video_hint_alloc(nb_rects, &size);
> > +    if (!hint)
> > +        return NULL;
> > +
> > +    buf = av_buffer_create((uint8_t *)hint, size, NULL, NULL, 0);
> > +    if (!buf) {
> > +        av_freep(&hint);
> > +        return NULL;
> > +    }
> > +
> > +    if (!av_frame_new_side_data_from_buf(frame, AV_FRAME_DATA_VIDEO_HINT, buf)) {
> > +        av_buffer_unref(&buf);
> > +        return NULL;
> > +    }
> > +
> > +    return hint;
> > +}
> > +
> > diff --git a/libavutil/video_hint.h b/libavutil/video_hint.h
> > new file mode 100644
> > index 0000000000..14bfe20aa6
> > --- /dev/null
> > +++ b/libavutil/video_hint.h
> > @@ -0,0 +1,117 @@
> > +/**
> > + * Copyright 2023 Elias Carotti <eliascrt at amazon dot it>
> > + *
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
> > + */
> > +
> > +#ifndef AVUTIL_VIDEO_HINT_H
> > +#define AVUTIL_VIDEO_HINT_H
> > +
> > +#include <stddef.h>
> > +#include <stdint.h>
> > +#include "libavutil/avassert.h"
> > +#include "libavutil/frame.h"
> > +
> 
> > +/**
> > +  * The type of the hinting information provided by an AVVideoRect.
> > +  * Currently only damage information, i.e., changed/unchanged portions of the frame.
> > +  **/
> 
> > +typedef enum AVVideoRectType {
> > +    AV_VIDEO_RECT_DAMAGE,
> 
> AV_VIDEO_RECT_TYPE_DAMAGE for consistency
> 
> > +} AVVideoRectType;
> 
> > +
> > +typedef struct AVVideoRect {
> > +    uint32_t x, y;
> > +    uint32_t width, height;
> > +    AVVideoRectType type;
> > +} AVVideoRect;
> > +
> > +typedef enum AVVideoHintType {
> > +    /* rectangled delimit the constant areas (unchanged), default is changed */
> 
> rectangles?
> 
> > +    AV_VIDEO_HINT_CONSTANT,
> 
> AV_VIDEO_HINT_TYPE_...
> 
> > +
> > +    /* rectangled delimit the constant areas (changed), default is not changed */
> 
> rectangles?
> 
> > +    AV_VIDEO_HINT_CHANGED,
> > +} AVVideoHintType;
> >
> > +typedef struct AVVideoHint {
> > +    /**
> > +     * Number of AVVideoRect present.
> > +     *
> > +     * May be 0, in which case no per-rectangle information is present. In this
> > +     * case the values of rect_offset / rect_size are unspecified and should
> > +     * not be accessed.
> > +     */
> > +    size_t nb_rects;
> > +
> > +    /**
> > +     * Offset in bytes from the beginning of this structure at which the array
> > +     * of AVVideoRect starts.
> > +     */
> > +    size_t rect_offset;
> > +
> > +    /**
> > +     * Size in bytes of AVVideoRect.
> > +     */
> > +    size_t rect_size;
> > +
> 
> > +    AVVideoHintType type;
> 
> sorry to nitpick, but if we have this information in the single rect
> should not we use that instead?
> 
> Basically I see two options:
> 1. generic VideoHint, storing single hint type and rects
> 
> for example we might have type=changed, and the rects storing only the
> changed rects
> 
> If we want to extend this then we need to add a new type. The problem
> might be if we want to store more than one VideoHint in the side data
> (it it possible to store more than one AVVideoHint, each one with a
> different type, as side data?). Suppose for example we want to add a
> new hint type, e.g. ROI, then we can add a new AVHideoHint with
> type=roi - but I don't know if we can have several frame hint side
> data (each one with different individual types).
> 
> 2. generic VideoHint, storing rects each one containing the hint type
> 
> for example we might have a list of rects, each one with
> type=changed|constant|roi. This would allow one to store different
> kind of hints in the same AVVideoHint structure, but at the same time
> would enable inconsistent data (for example we might have changed and
> unchanged rects in the same list of rects)
> 
> In each case, storing the type in the generic AVVideHint and in each
> rect might lead to inconsistent states (e.g. you might have generic
> type=damage, and have type=roi and type=changed in the rects), in this
> case having the type in both the general structure and in each rects
> seems redundant and leads to possible data inconsistencies.
> 
> ...
> 

> I would rather go to solution 1, but I'm not sure if having more than
> one hint in the side data (with different subtypes) is possible and
> wanted. If this is the case, probably the best solution would be to
> store more than one AVVideoHint (each one with an individual type and
> list of rects) in the single side data object.

Elaborating on this.

1. Ideally we might want to extend the av_frame_get_side_data API to
be able to store more than one entry per side data, by adding a unique
label to the side data, but this requires probably more effort and
extends the scope even further.

2. Alternatively, we might extend the hint API like this:

AVVideoHint *av_video_hint_create_side_data(AVFrame *frame,
                                            size_t nb_rects);

AVVideoHint *av_video_hint_extend_side_data(AVFrame *frame,
                                            size_t nb_rects);

by marking the continuation on the previous hint (but this would
complicate deserialization as you need then to iterate to get all the
side data).

Or we could make a variadic function like this:
AVVideoHint *av_video_hint_create_side_data(AVFrame *frame, size_t nb_rects, ...);

Alternatively, we might store in AVVideoHint more than one hint, but
this somehow conflict with the general design.

...

As a low effort approach, I would keep the initial design of a single
hint type per side-data (dropping the possibility of having more than
one hint type in the same side data), which should be fixed
generically by implementing 1.