[FFmpeg-devel] [PATCH V6 4/6] lavu: add side data AV_FRAME_DATA_BOUNDING_BOXES
Guo, Yejun
yejun.guo at intel.com
Thu Apr 1 14:31:31 EEST 2021
> -----Original Message-----
> From: ffmpeg-devel <ffmpeg-devel-bounces at ffmpeg.org> On Behalf Of Anton
> Khirnov
> Sent: 2021年4月1日 16:13
> To: FFmpeg development discussions and patches <ffmpeg-devel at ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH V6 4/6] lavu: add side data
> AV_FRAME_DATA_BOUNDING_BOXES
>
> Quoting Guo, Yejun (2021-03-26 09:09:29)
> > Signed-off-by: Guo, Yejun <yejun.guo at intel.com>
> > ---
> > doc/APIchanges | 2 ++
> > libavutil/Makefile | 1 +
> > libavutil/boundingbox.h | 79
> +++++++++++++++++++++++++++++++++++++++++
> > libavutil/frame.c | 1 +
> > libavutil/frame.h | 7 ++++
> > 5 files changed, 90 insertions(+)
> > +
> > +#ifndef AVUTIL_BOUNDINGBOX_H
> > +#define AVUTIL_BOUNDINGBOX_H
> > +
> > +#include "libavutil/rational.h"
> > +
> > +typedef struct AVBoundingBox {
> > + /**
> > + * Distance in pixels from the top edge of the frame to top
> > + * and bottom, and from the left edge of the frame to left and
> > + * right, defining the bounding box.
> > + */
> > + int top;
> > + int left;
> > + int bottom;
> > + int right;
>
> Is this allowed to be negative? Can/should this be size_t?
There was a long discussion about size_t/int/uint32_t when I added
struct AVRegionOfInterest in frame.h, and finally size_t is not chosen.
>
> > +
> > +#define AV_BBOX_LABEL_NAME_MAX_SIZE 32
> > +
> > + /**
> > + * Detect result with confidence
> > + */
> > + char detect_label[AV_BBOX_LABEL_NAME_MAX_SIZE];
> > + AVRational detect_confidence;
> > +
> > + /**
> > + * At most 4 classifications based on the detected bounding box.
> > + * For example, we can get max 4 different attributes with 4 different
> > + * DNN models on one bounding box.
> > + * classify_count is zero if no classification.
> > + */
> > +#define AV_NUM_BBOX_CLASSIFY 4
> > + uint32_t classify_count;
>
> size_t?
>
> > + char
> classify_labels[AV_NUM_BBOX_CLASSIFY][AV_BBOX_LABEL_NAME_MAX_SIZE];
> > + AVRational classify_confidences[AV_NUM_BBOX_CLASSIFY];
> > +} AVBoundingBox;
> > +
> > +typedef struct AVBoundingBoxHeader {
>
> This structure is not extensible in an ABI-compatible way.
yes, we can add a version number (enum AVBBoxHeaderVersion, see below) at the
beginning of the struct, and the user needs to set it with AV_BBOX_HEADER_CURRENT.
It is not elegant, is this what we really want?
enum AVBBoxHeaderVersion {
AV_BBOX_HEADER_NONE = 0,
AV_BBOX_HEADER_V1,
AV_BBOX_HEADER_CURRENT = AV_BBOX_HEADER_V1,
};
typedef struct AVBoundingBoxHeader {
enum AVBBoxHeaderVersion version;
union {
AVBoundingBoxHeaderV1 v1;
} header;
} AVBoundingBoxHeader;
We have carefully thought/discussed about the struct for months, and there's
little chance for later update, imho.
>
> > + /**
> > + * Information about how the bounding box is generated.
> > + * for example, the DNN model name.
> > + */
> > + char source[128];
> > +
> > + /**
> > + * The size of frame when it is detected.
> > + */
> > + int frame_width;
> > + int frame_height;
>
> Why?
so we know if the value of top/left/bottom/right in AVBoundingBox
can be used directly in other later filters.
for example, the filter chain is: dnn_detect -> scale -> dnn_classify (in plan)
In filter dnn_classify, it needs to check the frame size to know if the values
in bounding boxes (the detection result) are still valid.
>
> > +
> > + /**
> > + * Number of bounding boxes.
> > + */
> > + uint32_t nb_bbox;
>
> size_t?
>
> --
> Anton Khirnov
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email ffmpeg-devel-request at ffmpeg.org
> with subject "unsubscribe".
More information about the ffmpeg-devel
mailing list