[FFmpeg-devel] [PATCH 1/4] avutil: add generic side data for video coding info
Timothée
timothee.informatique at regaud-chapuy.fr
Sun Jul 20 21:24:42 EEST 2025
On 18/07/2025 17:48, Michael Niedermayer wrote :
> Hi
>
> On Fri, Jul 18, 2025 at 12:30:52PM +0200, Timothée Regaud wrote:
>> From: Timothee Regaud<timothee.informatique at regaud-chapuy.fr>
>>
>> Adds the generic data structures to libavutil. The design is recursive to support other codecs, even though the implementation is only for H.264 for now.
>>
>> Signed-off-by: Timothee Regaud<timothee.informatique at regaud-chapuy.fr>
>> ---
>> libavutil/Makefile | 1 +
>> libavutil/frame.h | 7 ++
>> libavutil/side_data.c | 1 +
>> libavutil/video_coding_info.h | 163 ++++++++++++++++++++++++++++++++++
>> 4 files changed, 172 insertions(+)
>> create mode 100644 libavutil/video_coding_info.h
>>
>> diff --git a/libavutil/Makefile b/libavutil/Makefile
>> index 94a56bb72f..44e51ab7ae 100644
>> --- a/libavutil/Makefile
>> +++ b/libavutil/Makefile
>> @@ -93,6 +93,7 @@ HEADERS = adler32.h \
>> tree.h \
>> twofish.h \
>> uuid.h \
>> + video_coding_info.h \
>> version.h \
>> video_enc_params.h \
>> xtea.h \
>> diff --git a/libavutil/frame.h b/libavutil/frame.h
>> index c50cd263d9..f4404472a0 100644
>> --- a/libavutil/frame.h
>> +++ b/libavutil/frame.h
>> @@ -254,6 +254,13 @@ enum AVFrameSideDataType {
>> * libavutil/tdrdi.h.
>> */
>> AV_FRAME_DATA_3D_REFERENCE_DISPLAYS,
>> +
>> + /**
>> + * Detailed block-level coding information. The data is an AVVideoCodingInfo
>> + * structure. This is exported by video decoders and can be used by filters
>> + * for analysis and visualization.
>> + */
>> + AV_FRAME_DATA_VIDEO_CODING_INFO,
>> };
>>
>> enum AVActiveFormatDescription {
>> diff --git a/libavutil/side_data.c b/libavutil/side_data.c
>> index fa2a2c2a13..b938ef6f52 100644
>> --- a/libavutil/side_data.c
>> +++ b/libavutil/side_data.c
>> @@ -56,6 +56,7 @@ static const AVSideDataDescriptor sd_props[] = {
>> [AV_FRAME_DATA_SEI_UNREGISTERED] = { "H.26[45] User Data Unregistered SEI message", AV_SIDE_DATA_PROP_MULTI },
>> [AV_FRAME_DATA_VIDEO_HINT] = { "Encoding video hint", AV_SIDE_DATA_PROP_SIZE_DEPENDENT },
>> [AV_FRAME_DATA_3D_REFERENCE_DISPLAYS] = { "3D Reference Displays Information", AV_SIDE_DATA_PROP_GLOBAL },
>> + [AV_FRAME_DATA_VIDEO_CODING_INFO] = { "Video Coding Info", AV_SIDE_DATA_PROP_SIZE_DEPENDENT },
>> };
>>
>> const AVSideDataDescriptor *av_frame_side_data_desc(enum AVFrameSideDataType type)
>> diff --git a/libavutil/video_coding_info.h b/libavutil/video_coding_info.h
>> new file mode 100644
>> index 0000000000..17e9345892
>> --- /dev/null
>> +++ b/libavutil/video_coding_info.h
>> @@ -0,0 +1,163 @@
>> +/*
>> + * This file is part of FFmpeg.
>> + *
>> + * FFmpeg is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2.1 of the License, or (at your option) any later version.
>> + *
>> + * FFmpeg is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with FFmpeg; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
>> + */
>> +
>> +#ifndef AVUTIL_VIDEO_CODING_INFO_H
>> +#define AVUTIL_VIDEO_CODING_INFO_H
>> +
>> +#include <stdint.h>
>> +#include <stddef.h>
>> +
>> +/**
>> + * @file
>> + * @ingroup lavu_frame
>> + * Structures for describing block-level video coding information.
>> + */
>> +
>> +/**
>> + * @defgroup lavu_video_coding_info Video Coding Info
>> + * @ingroup lavu_frame
>> + *
>> + * @{
>> + * Structures for describing block-level video coding information, to be
>> + * attached to an AVFrame as side data.
>> + *
>> + * All pointer-like members in these structures are offsets relative to the
>> + * start of the AVVideoCodingInfo struct to ensure the side data is
>> + * self-contained and relocatable. This is critical as the underlying buffer
>> + * may be moved in memory.
>> + */
>> +
>> +/**
>> + * Structure to hold inter-prediction information for a block.
>> + */
>> +typedef struct AVBlockInterInfo {
>> + /**
>> + * Offsets to motion vectors for list 0 and list 1, relative to the
>> + * start of the AVVideoCodingInfo struct.
>> + * The data for each list is an array of [x, y] pairs of int16_t.
>> + * The number of vectors is given by num_mv.
>> + * An offset of 0 indicates this data is not present.
>> + */
>> + size_t mv_offset[2];
> int16 is not enough, with growing picture sizes and growing precission of
> motion vectors
You are right. I didn't anticipate high resolution videos. I will change
it to int32 in the v2 patch.
> also the MV precssion is needed somewhere somehow or they could not be
> vissualized by generic code
That's true. I will add something like `uint8_t mv_precision_log2;`
>> +
>> + /**
>> + * Offsets to reference indices for list 0 and list 1, relative to the
>> + * start of the AVVideoCodingInfo struct.
>> + * The data is an array of int8_t. A value of -1 indicates the reference
>> + * is not used for a specific partition.
>> + * An offset of 0 indicates this data is not present.
>> + */
>> + size_t ref_idx_offset[2];
>> + /**
>> + * Number of motion vectors for list 0 and list 1.
>> + */
>> + uint8_t num_mv[2];
>> +} AVBlockInterInfo;
> weighted bi pred needs the weights too
>
> and for more than 1 MV, the question becomes what the other vectors
> are, bipred ?, affine MC ?, ...
It was intended for L0 and L1 vectors as used in H.264, but I see now
that this doesn't apply to every codec.
> Also if you want to be really generic you need to allow blocks
> that dont span accross the luma and chroma planes but allow
> different block structures (and motion vectors) per plane
>
> Iam not sure how generic we want to be and how useful that is.
>
> But it seemes you want your patch to be quite generic ?
>
> I think its more important to allow this to be extensible
> than suporting everything we can think of.
>
> That is maybe store the size of the struct also somewhere so
> that elements can be added to their end without breaking
> anything. At least for the main block structure
I will add a size field to the main block structure.
> I mean a future codec might allow non rectangular blocks but we
> dont want to think about that today.
>
> Maybe its best to keep this as simple as possible but extensible
Yes, my goal is for the patch to be as generic as possible, but it is
challenging since I have mostly worked on H.264 and do not know every
codec. Apparently, I've missed a few details.
I will add the following for weighted prediction:
|typedef struct AVBlockWeightInfo { int16_t luma_weight[2]; // For L0
and L1 int16_t luma_offset[2]; // For L0 and L1 int16_t
chroma_weight[2][2]; // For L0/L1 and Cb/Cr int16_t chroma_offset[2][2];
} AVBlockWeightInfo; |
|And add `size_t weight_info_offset;` to AVBlockInterInfo.|
This does not apply for every codec but it will work for some, like
H.264. We can always extend it in the future if other codecs need more
parameters, which the new size field will allow.
>> +
>> +/**
>> + * Structure to hold intra-prediction information for a block.
>> + */
>> +typedef struct AVBlockIntraInfo {
>> + /**
>> + * Offset to an array of intra prediction modes, relative to the
>> + * start of the AVVideoCodingInfo struct.
>> + * The number of modes is given by num_pred_modes.
>> + */
>> + size_t pred_mode_offset;
>> +
>> + /**
>> + * Number of intra prediction modes.
>> + */
>> + uint8_t num_pred_modes;
>> +
>> + /**
>> + * Chroma intra prediction mode.
>> + */
>> + uint8_t chroma_pred_mode;
>> +} AVBlockIntraInfo;
> classifying the predition in directional, DC, and non directional and
> for directional the direction. Could be usefull.
I will look into that for v2.
> Otherwise the prediction mode number requires codec specific knowledge
> to interpret
Yes, that's why I added `uint32_t codec_specific_type` in
AVVideoCodingInfoBlock.
Thanks,
Timothée
More information about the ffmpeg-devel
mailing list