[FFmpeg-devel] [PATCH 4/6] avformat/mov: parse ISO-14496-12 ChannelLayout

Sat Feb 25 06:31:17 EET 2023

On Fri, 2023-02-24 at 15:49 +0200, Jan Ekström wrote:
> On Fri, Feb 24, 2023 at 6:25 AM Zhao Zhili <quinkblack at foxmail.com> wrote:
> > 
> > From: Zhao Zhili <zhilizhao at tencent.com>
> > 
> > Signed-off-by: Zhao Zhili <zhilizhao at tencent.com>
> 
> Hah, I actually happened to recently start coding uncompressed audio
> support in mp4 myself, but what this commit is handling is what
> basically killed my version off since the channel layout box is
> required.
> 
> If you're interested you can check my take over at
> https://github.com/jeeb/ffmpeg/commits/pcmc_parsing_improvements .

Sorry I didn't notice your work on this issue. I have cherry-picked
the first two patches from your branch in v2. Is it OK for you?

It's tediousFor the channel layout supports. Some of the layouts aren't
supported yet, and some of the details are unclear. Please help review
and improve this part.

> 
> Will comment on some things.
> 
> > ---
> >  libavformat/mov.c      |  79 +++++++++++-
> >  libavformat/mov_chan.c | 265 +++++++++++++++++++++++++++++++++++++++++
> >  libavformat/mov_chan.h |  26 ++++
> >  3 files changed, 369 insertions(+), 1 deletion(-)
> > 
> > diff --git a/libavformat/mov.c b/libavformat/mov.c
> > index b125343f84..1db869aa2e 100644
> > --- a/libavformat/mov.c
> > +++ b/libavformat/mov.c
> > @@ -940,6 +940,82 @@ static int mov_read_chan(MOVContext *c, AVIOContext *pb, MOVAtom atom)
> >      return 0;
> >  }
> > 
> > +static int mov_read_chnl(MOVContext *c, AVIOContext *pb, MOVAtom atom)
> > +{
> > +    int64_t end = av_sat_add64(avio_tell(pb), atom.size);
> > +    int stream_structure;
> > +    int ret = 0;
> > +    AVStream *st;
> > +
> > +    if (c->fc->nb_streams < 1)
> > +        return 0;
> > +    st = c->fc->streams[c->fc->nb_streams-1];
> > +
> > +    /* skip version and flags */
> > +    avio_skip(pb, 4);
> 
> We should really not do this any more. Various FullBoxes have multiple
> versions or depend on the flags. See how I have added FullBox things
> recently, although I would prefer us to have a generic macro/function
> setup for this where you then get the version and flags as arguments
> or whatever in the future.

I have added version and flags check, and only supports version 0 with
patch v2. Welcome to add version 1 supports :)

I agree with the idea to cleanup the handling of version and flags for
future proof.

> 
> For this specific box, there are now versions 0 and 1 defined since
> circa 2018-2019 or so (visible at least in 14496-12 2022)
> 
> Since ISO/IEC has changed the rules for free specifications (against
> the wishes of various spec authors) and all that jazz, this is how
> it's defined in what I have on hand:
> 
> 12.2.4  Channel layout
> 
> 12.2.4.1  Definition
> 
> Box Types:  'chnl'
> Container: Audio sample entry
> Mandatory: No
> Quantity: Zero or one
> 
> This box may appear in an audio sample entry to document the
> assignment of channels in the audio
> stream. It is recommended to use this box to convey the base channel
> count for the DownMixInstructions
> box and other DRC-related boxes specified in ISO/IEC 23003-4.
> The channel layout can be all or part of a standard layout (from an
> enumerated list), or a custom layout
> (which also allows a track to contribute part of an overall layout).
> A stream may contain channels, objects, neither, or both. A stream
> that is neither channel nor object
> structured can implicitly be rendered in a variety of ways.
> 
> 12.2.4.2  Syntax
> 
> aligned(8) class ChannelLayout extends FullBox('chnl', version, flags=0) {
>    if (version==0) {
>       unsigned int(8) stream_structure;
>       if (stream_structure & channelStructured) {
>          unsigned int(8) definedLayout;
>           if (definedLayout==0) {
>             for (i = 1 ; i <= layout_channel_count ; i++) {
>                //  layout_channel_count comes from the sample entry
>                unsigned int(8) speaker_position;
>                if (speaker_position == 126) {   // explicit position
>                   signed int (16) azimuth;
>                   signed int (8)  elevation;
>                }
>             }
>          } else {
>             unsigned int(64)   omittedChannelsMap;
>                   // a ‘1’ bit indicates ‘not in this track’
>          }
>       }
>       if (stream_structure & objectStructured) {
>          unsigned int(8) object_count;
>       }
>    } else {
>       unsigned int(4) stream_structure;
>       unsigned int(4) format_ordering;
>       unsigned int(8) baseChannelCount;
>       if (stream_structure & channelStructured) {
>          unsigned int(8) definedLayout;
>          if (definedLayout==0) {
>             unsigned int(8) layout_channel_count;
>             for (i = 1 ; i <= layout_channel_count ; i++) {
>                unsigned int(8) speaker_position;
>                if (speaker_position == 126) {   // explicit position
>                   signed int (16) azimuth;
>                   signed int (8)  elevation;
>                }
>             }
>          } else {
>             int(4) reserved = 0;
>             unsigned int(3) channel_order_definition;
>             unsigned int(1) omitted_channels_present;
>             if (omitted_channels_present == 1) {
>                unsigned int(64)   omittedChannelsMap;
>                      // a ‘1’ bit indicates ‘not in this track’
>             }
>          }
>       }
>       if (stream_structure & objectStructured) {
>                      // object_count is derived from baseChannelCount
>       }
>    }
> }
> 
> 12.2.4.3  Semantics
> 
> version is an integer that specifies the version of this box (0 or 1).
> When authoring, version 1 should be
>         preferred over version 0. Version 1 conveys the channel
> ordering, which is not always the case for
>         version 0. Version 1 should be used to convey the base channel
> count for DRC.
> 
> stream_structure is a field of flags that define whether the stream
> has channel or object structure (or
>                  both, or neither); the following flags are defined,
> all other values are reserved:
>    1  the stream carries channels
>    2  the stream carries objects
> 
> format_ordering indicates the order of formats in the stream starting
> from the lowest channel index
>                 (see Table). Each format shall only use contiguous
> channel indices.
>    format_ordering Order
>    0               unknown
>    1               Channels, possibly followed by Objects
>    2               Objects, possibly followed by Channels
>    Remaining values are reserved
> 
> definedLayout is a ChannelConfiguration from ISO/IEC 23091-3.
> 
> speaker_position is an OutputChannelPosition from ISO/IEC 23091-3. If
> an explicit position is used,
>                  then the azimuth and elevation are as defined as for
> speakers in ISO/IEC 23091-3. The channel
>                  order corresponds to the order of speaker positions.
> 
> azimuth is a signed value in degrees, as defined for
> LoudspeakerAzimuth in ISO/IEC 23091-3.
> 
> elevation is a signed value, in degrees, as defined for
> LoudspeakerElevation in ISO/IEC 23091-3.
> 
> channel_order_definition indicates where the ordering of the audio
> channels for the definedLayout
>                          are specified (see Table).
> 
>    channel_order_definition Channel order specification
>    0                        as listed for the ChannelConfigurations in
> ISO/IEC 23091-3
>    1                        Default order of audio codec specification
>    2                        Channel ordering #2 of audio codec specification
>    3                        Channel ordering #3 of audio codec specification
>    4                        Channel ordering #4 of audio codec specification
>    Remaining values are reserved
> 
> omitted_channels_present is a flag that indicates if it is set to 1
> that the omittedChannelsMap is present.
> 
> omittedChannelsMap is a bit-map of omitted channels; the bits in the
> channel map are numbered from
>                    least-significant to most-significant, and
> correspond in that ordering with the order of the channels
>                    for  the  configuration  as  documented  in
> ISO/IEC  23091-3  ChannelConfiguration.  1-bits  in  the
>                    channel map mean that a channel is absent. A zero
> value of the map therefore always means that
>                    the given standard layout is fully present. The
> default value is 0.
> 
> layout_channel_count is the count of channels for the channel layout.
> The default value is 0 if stream_
>                      structure indicates that no channel structure is
> present. Otherwise, the value is the number of
>                      channels of the defined layout, if present,
> otherwise it is the value from the sample entry.
> object_count is the count of channels that contain audio objects. The
> default value is 0. For version
>              1 and if the objectStructured flag is set, the value is
> computed as baseChannelCount  minus the
>              channel count of the channel structure.
> 
> baseChannelCount represents the combined channel count of the channel
> layout and the object count.
>                  The value must match the base channel count for DRC
> (see ISO/IEC 23003-4).
> 
> 
> > +
> > +    stream_structure = avio_r8(pb);
> > +
> > +    // stream carries channels
> > +    if (stream_structure & 1) {
> > +        int layout = avio_r8(pb);
> > +
> > +        av_log(c->fc, AV_LOG_TRACE, "'chnl' layout %d\n", layout);
> > +        if (!layout) {
> > +            uint8_t positions[64] = {};
> > +            int enable = 1;
> > +
> > +            for (int i = 0; i < st->codecpar->ch_layout.nb_channels; i++) {
> > +                int speaker_pos = avio_r8(pb);
> > +
> > +                av_log(c->fc, AV_LOG_TRACE, "speaker_position %d\n", speaker_pos);
> > +                if (speaker_pos == 126) { // explicit position
> > +                    int16_t azimuth = avio_rb16(pb);
> > +                    int8_t elevation = avio_r8(pb);
> > +
> > +                    av_log(c->fc, AV_LOG_TRACE, "azimuth %d, elevation %d\n",
> > +                           azimuth, elevation);
> > +                    // Don't support explicit position
> > +                    enable = 0;
> > +                } else if (i < FF_ARRAY_ELEMS(positions)) {
> > +                    positions[i] = speaker_pos;
> > +                } else {
> > +                    // number of channel out of our supported range
> > +                    enable = 0;
> > +                }
> > +            }
> > +
> > +            if (enable) {
> > +                ret = ff_mov_get_layout_from_channel_positions(positions,
> > +                        st->codecpar->ch_layout.nb_channels,
> > +                        &st->codecpar->ch_layout);
> > +                if (ret) {
> > +                    av_log(c->fc, AV_LOG_WARNING, "unsupported speaker positions\n");
> > +                    ret = 0;
> > +                }
> > +            }
> > +        } else {
> > +            uint64_t omitted_channel_map = avio_rb64(pb);
> > +
> > +            if (omitted_channel_map) {
> > +                avpriv_request_sample(c->fc, "omitted_channel_map 0x%" PRIx64 " != 0",
> > +                                      omitted_channel_map);
> > +                return AVERROR_PATCHWELCOME;
> > +            }
> > +            ff_mov_get_channel_layout_from_config(layout, &st->codecpar->ch_layout);
> > +        }
> > +    }
> > +
> > +    // stream carries objects
> > +    if (stream_structure & 2) {
> > +        int obj_count = avio_r8(pb);
> > +        av_log(c->fc, AV_LOG_TRACE, "'chnl' with object_count %d\n", obj_count);
> > +    }
> > +
> > +    avio_seek(pb, end, SEEK_SET);
> > +    return ret;
> > +}
> > +
> >  static int mov_read_wfex(MOVContext *c, AVIOContext *pb, MOVAtom atom)
> >  {
> >      AVStream *st;
> > @@ -7784,7 +7860,8 @@ static const MOVParseTableEntry mov_default_parse_table[] = {
> >  { MKTAG('w','i','d','e'), mov_read_wide }, /* place holder */
> >  { MKTAG('w','f','e','x'), mov_read_wfex },
> >  { MKTAG('c','m','o','v'), mov_read_cmov },
> > -{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout */
> > +{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout from quicktime */
> > +{ MKTAG('c','h','n','l'), mov_read_chnl }, /* channel layout from ISO-14496-12 */
> >  { MKTAG('d','v','c','1'), mov_read_dvc1 },
> >  { MKTAG('s','g','p','d'), mov_read_sgpd },
> >  { MKTAG('s','b','g','p'), mov_read_sbgp },
> > diff --git a/libavformat/mov_chan.c b/libavformat/mov_chan.c
> > index f66bf0df7f..10ebcdc08f 100644
> > --- a/libavformat/mov_chan.c
> > +++ b/libavformat/mov_chan.c
> > @@ -551,3 +551,268 @@ int ff_mov_read_chan(AVFormatContext *s, AVIOContext *pb, AVStream *st,
> > 
> >      return 0;
> >  }
> > +
> > +/* ISO/IEC 23001-8, 8.2 */
> > +static const AVChannelLayout iso_channel_configuration[] = {
> > +    // 0: any setup
> > +    {},
> > +
> 
> I think the better naming for this would be CICP channel configuration
> since the specification is called "common independent coding points"
> (for video this is shared with ITU-T H.273 which is free).
> 
> Also do note that a whole bunch of these are not in the channel order
> that FFmpeg wants after stereo :<
> 
> Thankfully with manual mapping FFmpeg native channel layouts' channel
> order should be writable and readable.
> 
> The channel orders for various CICP layouts can be found both in the
> referenced specifications, as well as in the comments from Apple's
> headers for example
> 
> // ISO/IEC 23091-3, channels w/orderings
> kAudioChannelLayoutTag_CICP_1                   =
> kAudioChannelLayoutTag_MPEG_1_0,      ///< C
> kAudioChannelLayoutTag_CICP_2                   =
> kAudioChannelLayoutTag_MPEG_2_0,      ///< L R
> kAudioChannelLayoutTag_CICP_3                   =
> kAudioChannelLayoutTag_MPEG_3_0_A,    ///< L R C
> kAudioChannelLayoutTag_CICP_4                   =
> kAudioChannelLayoutTag_MPEG_4_0_A,    ///< L R C Cs
> kAudioChannelLayoutTag_CICP_5                   =
> kAudioChannelLayoutTag_MPEG_5_0_A,    ///< L R C Ls Rs
> kAudioChannelLayoutTag_CICP_6                   =
> kAudioChannelLayoutTag_MPEG_5_1_A,    ///< L R C LFE Ls Rs
> kAudioChannelLayoutTag_CICP_7                   =
> kAudioChannelLayoutTag_MPEG_7_1_B,    ///< L R C LFE Ls Rs Lc Rc
> 
> kAudioChannelLayoutTag_CICP_9                   =
> kAudioChannelLayoutTag_ITU_2_1,       ///< L R Cs
> kAudioChannelLayoutTag_CICP_10                  =
> kAudioChannelLayoutTag_ITU_2_2,       ///< L R Ls Rs
> kAudioChannelLayoutTag_CICP_11                  =
> kAudioChannelLayoutTag_MPEG_6_1_A,    ///< L R C LFE Ls Rs Cs
> kAudioChannelLayoutTag_CICP_12                  =
> kAudioChannelLayoutTag_MPEG_7_1_C,    ///< L R C LFE Ls Rs Rls Rrs
> kAudioChannelLayoutTag_CICP_13                  = (204U<<16) | 24,
>                    ///< Lc Rc C LFE2 Rls Rrs L R Cs LFE3 Lss Rss Vhl
> Vhr Vhc Ts Ltr Rtr Ltm Rtm Ctr Cb Lb Rb
> 
> kAudioChannelLayoutTag_CICP_14                  = (205U<<16) | 8,
>                ///< L R C LFE Ls Rs Vhl Vhr
> kAudioChannelLayoutTag_CICP_15                  = (206U<<16) | 12,
>                    ///< L R C LFE2 Rls Rrs LFE3 Lss Rss Vhl Vhr Ctr
> 
> kAudioChannelLayoutTag_CICP_16                  = (207U<<16) | 10,
>                    ///< L R C LFE Ls Rs Vhl Vhr Lts Rts
> kAudioChannelLayoutTag_CICP_17                  = (208U<<16) | 12,
>                    ///< L R C LFE Ls Rs Vhl Vhr Vhc Lts Rts Ts
> kAudioChannelLayoutTag_CICP_18                  = (209U<<16) | 14,
>                    ///< L R C LFE Ls Rs Lbs Rbs Vhl Vhr Vhc Lts Rts Ts
> 
> kAudioChannelLayoutTag_CICP_19                  = (210U<<16) | 12,
>                    ///< L R C LFE Rls Rrs Lss Rss Vhl Vhr Ltr Rtr
> kAudioChannelLayoutTag_CICP_20                  = (211U<<16) | 14,
>                    ///< L R C LFE Rls Rrs Lss Rss Vhl Vhr Ltr Rtr Leos
> Reos
> 
> Best regards,
> Jan
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".