[FFmpeg-devel] RFC: new packed pixel formats (machine vision)

Tue Oct 22 09:50:28 EEST 2024

Ping.

Is the below a viable scheme or are there concerns to consider in this
initial design stage?

Thanks and all the best,
Dee

On Tue, Oct 15, 2024 at 8:55 AM Diederick C. Niehorster
<dcnieho at gmail.com> wrote:
>
> Hi All,
>
> I want to pick up a discussion i started last week
> (https://ffmpeg.org/pipermail/ffmpeg-devel/2024-October/334585.html)
> in a new thread, with the relevant information nicely organized. This
> is about adding pixel formats common in machine vision to ffmpeg
> (though i understand some formats may also be used by cinema cameras),
> and supporting them as input formats in swscale so that it becomes
> easy to use ffmpeg for machine vision purposes (I already have such
> software, it will be open-sourced in good time, but right now there is
> a proprietary conversion layer from Basler i need to replace (e.g. by
> this proposal)).
>
> Example formats are 10 and 12 bit Bayer formats, where the 10 bit
> cannot be represented in AVPixFmtDescriptors as currently as effective
> bit depth for the red and blue channels is 2.5 bits, but component
> depths should be integers. Other example formats are 10bit gray
> formats where multiple values are packed without padding over multiple
> bytes (e.g. 4 10-bit pixels packed into 5 bytes, so not aligned to 16
> or 32 bits).
>
> See https://www.1stvision.com/cameras/IDS/IDS-manuals/en/basics-monochrome-pixel-formats.html
> for a diagram of the Mono10p and
> https://www.1stvision.com/cameras/IDS/IDS-manuals/en/basics-raw-bayer-pixel-formats.html
> for diagrams of the packed and not packed bayer formats.
>
> Here a proposal for how these new formats could be encoded into
> AVPixFmtDescriptor, so that these can then be used in ffmpeg/swscale.
> I have taken care that none of the existing pixel formats or any code
> dealing with them would be affected, although new code would be needed
> to handle these new formats (av_read_image_line2, av_write_image_line2
> and functions printing info about AVPixFmtDescriptors, plus swscale of
> course--i commit to do a full audit to ensure nothing else is missed).
>
> First, two new flags are needed (usages are shown below in the example
> new pixel formats). I propose:
> - AV_PIX_FMT_FLAG_DEPTH_INT16_RATIONAL which indicates that the value
> in the component depths (ints) represent a 16 bit numerator and
> denominator packed into the int. That should be able to store any
> value that could ever be possible and importantly allows for the
> fractional bit depths needed for the bayer formats.
> - AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED which indicates formats that are
> bit-wise packed in a way that is not aligned on 1, 2 or 4 bytes (e.g.
> 4 10-bit values in 5 bytes). This flag is needed because
> AV_PIX_FMT_FLAG_BITSTREAM
> formats are aligned to 8 or 32 bits, and this kind of unaligned
> packing needs special handling ( see below).
>
> Using these flags, here are some example new pixel formats:
>     [AV_PIX_FMT_BAYER_RGGB10] = {
>         .name = "bayer_rggb10",
>         .nb_components = 3,
>         .log2_chroma_w = 0,
>         .log2_chroma_h = 0,
>         .comp = {
>             { 0, 2, 0, 0, 655364 },  /* 2.5: 10/4 (10<<16 + 4) */
>             { 0, 2, 0, 0, 655362 },  /* 5: 10/2 */
>             { 0, 2, 0, 0, 655364 },  /* 2.5: 10/4 */
>         },
>         .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER |
> AV_PIX_FMT_FLAG_DEPTH_INT16_RATIONAL,
>     },
>     [AV_PIX_FMT_BAYER_RGGB12] = {
>         .name = "bayer_rggb12",
>         .nb_components = 3,
>         .log2_chroma_w = 0,
>         .log2_chroma_h = 0,
>         .comp = {
>             { 0, 2, 0, 0, 3 },
>             { 0, 2, 0, 0, 6 },
>             { 0, 2, 0, 0, 3 },
>         },
>         .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER,
>     },
>     [AV_PIX_FMT_BAYER_GRAY10P] = {
>         .name = "gray10p",
>         .nb_components = 1,
>         .log2_chroma_w = 0,
>         .log2_chroma_h = 0,
>         .comp = {
>             { 0, 2, 0, 0, 10 },       /* Y */
>         },
>         .flags = AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED,
>     },
>     [AV_PIX_FMT_BAYER_RGGB10P] = {
>         .name = "bayer_rggb10p",
>         .nb_components = 3,
>         .log2_chroma_w = 0,
>         .log2_chroma_h = 0,
>         .comp = {
>             { 0, 2, 0, 0, 655364 },  /* 2.5: 10/4 (10<<16 + 2) */
>             { 0, 2, 0, 0, 655362 },  /* 5: 10/2 */
>             { 0, 2, 0, 0, 655364 },  /* 2.5: 10/4 */
>         },
>         .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER |
> AV_PIX_FMT_FLAG_DEPTH_INT16_RATIONAL |
> AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED,
>     },
>     [AV_PIX_FMT_BAYER_RGGB12P] = {
>         .name = "bayer_rggb12p",
>         .nb_components = 3,
>         .log2_chroma_w = 0,
>         .log2_chroma_h = 0,
>         .comp = {
>             { 0, 2, 0, 0, 3 },
>             { 0, 2, 0, 0, 6 },
>             { 0, 2, 0, 0, 3 },
>         },
>         .flags = AV_PIX_FMT_FLAG_RGB | AV_PIX_FMT_FLAG_BAYER |
> AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED,
>     },
>
> When a AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED is encountered, one needs
> to find out how many bytes are used to store how many samples (with a
> "sample" I refer to one color channel value or a gray scale value).
> This information can be distilled from the AVPixFmtDescriptor as
> follows:
> gray10p: sum(component_bit_depths)=10: least common multiple of 10 and
> 8 is 40, so there are 40/10=4 samples packed in to 40/8=5 bytes.
> bayer_rggb10p: sum(component_bit_depths)=10: least common multiple of
> 10 and 8 is 40, so there are 40/10=4 samples packed in to 40/8=5
> bytes.
> bayer_rggb12p: sum(component_bit_depths)=12: least common multiple of
> 12 and 8 is 24, so there are 24/12=2 samples packed in to 24/8=3
> bytes.
> Presence of the AV_PIX_FMT_FLAG_BITPACKED_UNALIGNED flag indicates
> that such computations are needed and leaves it flexible how many
> samples are packed into how many bytes.
>
> I have not thought about whether this would also allow turning v210
> (v210enc/dec, AV_CODEC_ID_V210 ) into a pixel format and deprecating
> the encoder/decoder (presumably its a good thing to remove this
> special handling), or whether this scheme then runs into a limitation.
> bitpacked_enc (AV_CODEC_ID_BITPACKED) should also be examined. I leave
> examining this for a later stage after comments on the above proposal.
>
> Looking forward to hearing what you/the list think!
>
> All the best,
> Dee