[FFmpeg-devel] [PATCH 1/4] avutil: add generic side data for video coding info

Sun Jul 20 21:24:46 EEST 2025

On 18/07/2025 19:42, Lynne a wrote :
> On 18/07/2025 19:30, Timothée Regaud wrote:
>> From: Timothee Regaud <timothee.informatique at regaud-chapuy.fr>
>>
>> Adds the generic data structures to libavutil. The design is 
>> recursive to support other codecs, even though the implementation is 
>> only for H.264 for now.
>>
>> Signed-off-by: Timothee Regaud <timothee.informatique at regaud-chapuy.fr>
>> ---
>>   libavutil/Makefile            |   1 +
>>   libavutil/frame.h             |   7 ++
>>   libavutil/side_data.c         |   1 +
>>   libavutil/video_coding_info.h | 163 ++++++++++++++++++++++++++++++++++
>>   4 files changed, 172 insertions(+)
>>   create mode 100644 libavutil/video_coding_info.h
>>
>> diff --git a/libavutil/Makefile b/libavutil/Makefile
>> index 94a56bb72f..44e51ab7ae 100644
>> --- a/libavutil/Makefile
>> +++ b/libavutil/Makefile
>> @@ -93,6 +93,7 @@ HEADERS = 
>> adler32.h                                                     \
>> tree.h                                                        \
>> twofish.h                                                     \
>> uuid.h                                                        \
>> + video_coding_info.h                                           \
>> version.h                                                     \
>> video_enc_params.h                                            \
>> xtea.h                                                        \
>> diff --git a/libavutil/frame.h b/libavutil/frame.h
>> index c50cd263d9..f4404472a0 100644
>> --- a/libavutil/frame.h
>> +++ b/libavutil/frame.h
>> @@ -254,6 +254,13 @@ enum AVFrameSideDataType {
>>        * libavutil/tdrdi.h.
>>        */
>>       AV_FRAME_DATA_3D_REFERENCE_DISPLAYS,
>> +
>> +    /**
>> +     * Detailed block-level coding information. The data is an 
>> AVVideoCodingInfo
>> +     * structure. This is exported by video decoders and can be used 
>> by filters
>> +     * for analysis and visualization.
>> +     */
>> +    AV_FRAME_DATA_VIDEO_CODING_INFO,
>>   };
>>     enum AVActiveFormatDescription {
>> diff --git a/libavutil/side_data.c b/libavutil/side_data.c
>> index fa2a2c2a13..b938ef6f52 100644
>> --- a/libavutil/side_data.c
>> +++ b/libavutil/side_data.c
>> @@ -56,6 +56,7 @@ static const AVSideDataDescriptor sd_props[] = {
>>       [AV_FRAME_DATA_SEI_UNREGISTERED]            = { "H.26[45] User 
>> Data Unregistered SEI message",  AV_SIDE_DATA_PROP_MULTI },
>>       [AV_FRAME_DATA_VIDEO_HINT]                  = { "Encoding video 
>> hint", AV_SIDE_DATA_PROP_SIZE_DEPENDENT },
>>       [AV_FRAME_DATA_3D_REFERENCE_DISPLAYS]       = { "3D Reference 
>> Displays Information", AV_SIDE_DATA_PROP_GLOBAL },
>> +    [AV_FRAME_DATA_VIDEO_CODING_INFO]           = { "Video Coding 
>> Info", AV_SIDE_DATA_PROP_SIZE_DEPENDENT },
>>   };
>>     const AVSideDataDescriptor *av_frame_side_data_desc(enum 
>> AVFrameSideDataType type)
>> diff --git a/libavutil/video_coding_info.h 
>> b/libavutil/video_coding_info.h
>> new file mode 100644
>> index 0000000000..17e9345892
>> --- /dev/null
>> +++ b/libavutil/video_coding_info.h
>> @@ -0,0 +1,163 @@
>> +/*
>> + * This file is part of FFmpeg.
>> + *
>> + * FFmpeg is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2.1 of the License, or (at your option) any later version.
>> + *
>> + * FFmpeg is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with FFmpeg; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 
>> 02110-1301 USA
>> + */
>> +
>> +#ifndef AVUTIL_VIDEO_CODING_INFO_H
>> +#define AVUTIL_VIDEO_CODING_INFO_H
>> +
>> +#include <stdint.h>
>> +#include <stddef.h>
>> +
>> +/**
>> + * @file
>> + * @ingroup lavu_frame
>> + * Structures for describing block-level video coding information.
>> + */
>> +
>> +/**
>> + * @defgroup lavu_video_coding_info Video Coding Info
>> + * @ingroup lavu_frame
>> + *
>> + * @{
>> + * Structures for describing block-level video coding information, 
>> to be
>> + * attached to an AVFrame as side data.
>> + *
>> + * All pointer-like members in these structures are offsets relative 
>> to the
>> + * start of the AVVideoCodingInfo struct to ensure the side data is
>> + * self-contained and relocatable. This is critical as the 
>> underlying buffer
>> + * may be moved in memory.
>> + */
>> +
>> +/**
>> + * Structure to hold inter-prediction information for a block.
>> + */
>> +typedef struct AVBlockInterInfo {
>> +    /**
>> +     * Offsets to motion vectors for list 0 and list 1, relative to the
>> +     * start of the AVVideoCodingInfo struct.
>> +     * The data for each list is an array of [x, y] pairs of int16_t.
>> +     * The number of vectors is given by num_mv.
>> +     * An offset of 0 indicates this data is not present.
>> +     */
>> +    size_t mv_offset[2];
>> +
>> +    /**
>> +     * Offsets to reference indices for list 0 and list 1, relative 
>> to the
>> +     * start of the AVVideoCodingInfo struct.
>> +     * The data is an array of int8_t. A value of -1 indicates the 
>> reference
>> +     * is not used for a specific partition.
>> +     * An offset of 0 indicates this data is not present.
>> +     */
>> +    size_t ref_idx_offset[2];
>> +    /**
>> +     * Number of motion vectors for list 0 and list 1.
>> +     */
>> +    uint8_t num_mv[2];
>> +} AVBlockInterInfo;
>> +
>> +/**
>> + * Structure to hold intra-prediction information for a block.
>> + */
>> +typedef struct AVBlockIntraInfo {
>> +    /**
>> +     * Offset to an array of intra prediction modes, relative to the
>> +     * start of the AVVideoCodingInfo struct.
>> +     * The number of modes is given by num_pred_modes.
>> +     */
>> +    size_t pred_mode_offset;
>> +
>> +    /**
>> +     * Number of intra prediction modes.
>> +     */
>> +    uint8_t num_pred_modes;
>> +
>> +    /**
>> +     * Chroma intra prediction mode.
>> +     */
>> +    uint8_t chroma_pred_mode;
>> +} AVBlockIntraInfo;
>> +
>> +/**
>> + * Main structure for a single coding block.
>> + * This structure can be recursive for codecs that use tree-based 
>> partitioning.
>> + */
>> +typedef struct AVVideoCodingInfoBlock {
>> +    /**
>> +     * Position (x, y) and size (w, h) of the block, in pixels,
>> +     * relative to the top-left corner of the frame.
>> +     */
>> +    int16_t x, y;
>> +    uint8_t w, h;
>> +
>> +    /**
>> +     * Flag indicating if the block is intra-coded.
>> +     * 1 if intra, 0 if inter.
>> +     */
>> +    uint8_t is_intra;
>> +
>> +    /**
>> +     * The original, codec-specific type of this block or macroblock.
>> +     * This allows a filter to have codec-specific logic for 
>> interpreting
>> +     * the generic prediction information based on the source codec.
>> +     * For example, for H.264, this would store the MB type flags 
>> (MB_TYPE_*).
>> +     */
>> +    uint32_t codec_specific_type;
>> +
>> +    union {
>> +        AVBlockIntraInfo intra;
>> +        AVBlockInterInfo inter;
>> +    };
>> +
>> +    /**
>> +     * Number of child blocks this block is partitioned into.
>> +     * If 0, this is a leaf node in the partition tree.
>> +     */
>> +    uint8_t num_children;
>> +
>> +    /**
>> +     * Offset to an array of child AVVideoCodingInfoBlock 
>> structures, relative
>> +     * to the start of the AVVideoCodingInfo struct.
>> +     * This allows for recursive representation of coding structures.
>> +     * An offset of 0 indicates there are no children.
>> +     */
>> +    size_t children_offset;
>> +} AVVideoCodingInfoBlock;
>> +
>> +/**
>> + * Top-level structure to be attached to an AVFrame as side data.
>> + * It contains an array of the highest-level coding blocks (e.g., 
>> CTUs or MBs).
>> + */
>> +typedef struct AVVideoCodingInfo {
>> +    /**
>> +     * Number of top-level blocks in the frame.
>> +     */
>> +    uint32_t nb_blocks;
>> +
>> +    /**
>> +     * Offset to an array of top-level blocks, relative to the start 
>> of the
>> +     * AVVideoCodingInfo struct.
>> +     * The actual data for these blocks, and any child blocks or 
>> sub-data,
>> +     * is stored contiguously in the AVBufferRef attached to the 
>> side data.
>> +     */
>> +    size_t blocks_offset;
>> +} AVVideoCodingInfo;
>> +
>> +/**
>> + * @}
>> + */
>> +
>> +#endif /* AVUTIL_VIDEO_CODING_INFO_H */
>
> Absolutely not.
> Use and extend libavutil/video_enc_params.h instead.
Adding it to `libavutil/video_enc_params.h` seemed counter-intuitive to 
me, as that header is for encoder-centric parameters, while my structs 
describe decoded data. However, I agree it is better for API 
consistency, so I will move them in v2.
>  And if at all possible, don't implement an inspection tool in ffmpeg 
> *just because you want to*. Parsing a bitstream and displaying it is 
> not a very complicated thing, but exposing an API very much is a very 
> complicated thing.

I completely understand your concern about exposing a new API just for a 
single inspection tool. My implementation in vf_codecview is intended as 
a proof-of-concept to demonstrate the utility of this exported data. 
This data has many potential uses, such as analysis, debugging new 
codecs, academic research, and more. But I can agree that my 
implementation in codecview isn't that useful. Therefore, it would be 
better to have a new flag, similar to the existing -flags2 +export_mvs.I 
will add a new flag, -flags2 +export_coding_info, so the data can be 
used by other tools.

Thanks,

Timothée