[FFmpeg-devel] new packed pixel formats (machine vision)

Tue Oct 8 10:31:24 EEST 2024

Hi All,

I am using ffmpeg for a library to interface with machine vision
cameras that i am developing (not yet released),it allows storing the
streams with really nice performance directly to, e.g., x264/mp4
(thanks ffmpeg!). For this, i am looking to support some new pixel
formats as input formats in swscale.
A first step for this would be to describe these formats in a
AVPixFmtDescriptor. Example machine vision pixel formats are:
Mono10p: 10-bit luma only: 4 pixels packed into 5 bytes (i.e., no padding)
BayerRG10: 10-bit color components, in bayer patterns, 1 component put
into 2 bytes (10 bits data+6 bits padding)
BayerRG10p: 10-bit color components, in bayer pattern, 4 color
components packed into 5 bytes (really the same packing as Mono10p).
And also 12 bit variants.

See https://www.1stvision.com/cameras/IDS/IDS-manuals/en/basics-monochrome-pixel-formats.html
for a diagram of the Mono10p and
https://www.1stvision.com/cameras/IDS/IDS-manuals/en/basics-raw-bayer-pixel-formats.html
for diagrams of the packed and not packed bayer formats.

I am wondering how to map these to AVPixFmtDescriptors.
BayerRG10 is a problem:
    [AV_PIX_FMT_BAYER_RGGB10] = {
        .name = "bayer_rggb10",
        .nb_components = 3,
        .log2_chroma_w = 0,
        .log2_chroma_h = 0,
        .comp = {
            { 0, 2, 0, 0, ? },
            { 0, 2, 0, 0, ? },
            { 0, 2, 0, 0, ? },
        },
        .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER,
    },
What should be put for the component depths? for 8bit bayer, 2,4,2 is
used and for 16bit 4,8,4. This I guess signifies that for rggb 1/4
components is red or blue and 2/4 are green. By this logic, the values
should be 2.5,5,2.5 for a ten-bit bayer format, which is not possible
(12bit is possible, with 3,6,3). How could i handle this?

For
Mono10p (would be something like gray10p)
BayerRG10p (would be something like bayer_rggb10p)
my first question is: should this be encoded as bitstreams since their
pixel values are not byte-aligned?

Lastly, if i figure this out, is this something that might be
considered for inclusion in ffmpeg, or is there a policy/strong
opinions against these machine vision formats?

All the best,
Dee