[FFmpeg-devel] [PATCH 4/6] avformat/mov: parse ISO-14496-12 ChannelLayout

Jan Ekström jeebjp at gmail.com
Fri Feb 24 15:49:25 EET 2023


On Fri, Feb 24, 2023 at 6:25 AM Zhao Zhili <quinkblack at foxmail.com> wrote:
>
> From: Zhao Zhili <zhilizhao at tencent.com>
>
> Signed-off-by: Zhao Zhili <zhilizhao at tencent.com>

Hah, I actually happened to recently start coding uncompressed audio
support in mp4 myself, but what this commit is handling is what
basically killed my version off since the channel layout box is
required.

If you're interested you can check my take over at
https://github.com/jeeb/ffmpeg/commits/pcmc_parsing_improvements .

Will comment on some things.

> ---
>  libavformat/mov.c      |  79 +++++++++++-
>  libavformat/mov_chan.c | 265 +++++++++++++++++++++++++++++++++++++++++
>  libavformat/mov_chan.h |  26 ++++
>  3 files changed, 369 insertions(+), 1 deletion(-)
>
> diff --git a/libavformat/mov.c b/libavformat/mov.c
> index b125343f84..1db869aa2e 100644
> --- a/libavformat/mov.c
> +++ b/libavformat/mov.c
> @@ -940,6 +940,82 @@ static int mov_read_chan(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>      return 0;
>  }
>
> +static int mov_read_chnl(MOVContext *c, AVIOContext *pb, MOVAtom atom)
> +{
> +    int64_t end = av_sat_add64(avio_tell(pb), atom.size);
> +    int stream_structure;
> +    int ret = 0;
> +    AVStream *st;
> +
> +    if (c->fc->nb_streams < 1)
> +        return 0;
> +    st = c->fc->streams[c->fc->nb_streams-1];
> +
> +    /* skip version and flags */
> +    avio_skip(pb, 4);

We should really not do this any more. Various FullBoxes have multiple
versions or depend on the flags. See how I have added FullBox things
recently, although I would prefer us to have a generic macro/function
setup for this where you then get the version and flags as arguments
or whatever in the future.

For this specific box, there are now versions 0 and 1 defined since
circa 2018-2019 or so (visible at least in 14496-12 2022)

Since ISO/IEC has changed the rules for free specifications (against
the wishes of various spec authors) and all that jazz, this is how
it's defined in what I have on hand:

12.2.4  Channel layout

12.2.4.1  Definition

Box Types:  'chnl'
Container: Audio sample entry
Mandatory: No
Quantity: Zero or one

This box may appear in an audio sample entry to document the
assignment of channels in the audio
stream. It is recommended to use this box to convey the base channel
count for the DownMixInstructions
box and other DRC-related boxes specified in ISO/IEC 23003-4.
The channel layout can be all or part of a standard layout (from an
enumerated list), or a custom layout
(which also allows a track to contribute part of an overall layout).
A stream may contain channels, objects, neither, or both. A stream
that is neither channel nor object
structured can implicitly be rendered in a variety of ways.

12.2.4.2  Syntax

aligned(8) class ChannelLayout extends FullBox('chnl', version, flags=0) {
   if (version==0) {
      unsigned int(8) stream_structure;
      if (stream_structure & channelStructured) {
         unsigned int(8) definedLayout;
          if (definedLayout==0) {
            for (i = 1 ; i <= layout_channel_count ; i++) {
               //  layout_channel_count comes from the sample entry
               unsigned int(8) speaker_position;
               if (speaker_position == 126) {   // explicit position
                  signed int (16) azimuth;
                  signed int (8)  elevation;
               }
            }
         } else {
            unsigned int(64)   omittedChannelsMap;
                  // a ‘1’ bit indicates ‘not in this track’
         }
      }
      if (stream_structure & objectStructured) {
         unsigned int(8) object_count;
      }
   } else {
      unsigned int(4) stream_structure;
      unsigned int(4) format_ordering;
      unsigned int(8) baseChannelCount;
      if (stream_structure & channelStructured) {
         unsigned int(8) definedLayout;
         if (definedLayout==0) {
            unsigned int(8) layout_channel_count;
            for (i = 1 ; i <= layout_channel_count ; i++) {
               unsigned int(8) speaker_position;
               if (speaker_position == 126) {   // explicit position
                  signed int (16) azimuth;
                  signed int (8)  elevation;
               }
            }
         } else {
            int(4) reserved = 0;
            unsigned int(3) channel_order_definition;
            unsigned int(1) omitted_channels_present;
            if (omitted_channels_present == 1) {
               unsigned int(64)   omittedChannelsMap;
                     // a ‘1’ bit indicates ‘not in this track’
            }
         }
      }
      if (stream_structure & objectStructured) {
                     // object_count is derived from baseChannelCount
      }
   }
}

12.2.4.3  Semantics

version is an integer that specifies the version of this box (0 or 1).
When authoring, version 1 should be
        preferred over version 0. Version 1 conveys the channel
ordering, which is not always the case for
        version 0. Version 1 should be used to convey the base channel
count for DRC.

stream_structure is a field of flags that define whether the stream
has channel or object structure (or
                 both, or neither); the following flags are defined,
all other values are reserved:
   1  the stream carries channels
   2  the stream carries objects

format_ordering indicates the order of formats in the stream starting
from the lowest channel index
                (see Table). Each format shall only use contiguous
channel indices.
   format_ordering Order
   0               unknown
   1               Channels, possibly followed by Objects
   2               Objects, possibly followed by Channels
   Remaining values are reserved

definedLayout is a ChannelConfiguration from ISO/IEC 23091-3.

speaker_position is an OutputChannelPosition from ISO/IEC 23091-3. If
an explicit position is used,
                 then the azimuth and elevation are as defined as for
speakers in ISO/IEC 23091-3. The channel
                 order corresponds to the order of speaker positions.

azimuth is a signed value in degrees, as defined for
LoudspeakerAzimuth in ISO/IEC 23091-3.

elevation is a signed value, in degrees, as defined for
LoudspeakerElevation in ISO/IEC 23091-3.

channel_order_definition indicates where the ordering of the audio
channels for the definedLayout
                         are specified (see Table).

   channel_order_definition Channel order specification
   0                        as listed for the ChannelConfigurations in
ISO/IEC 23091-3
   1                        Default order of audio codec specification
   2                        Channel ordering #2 of audio codec specification
   3                        Channel ordering #3 of audio codec specification
   4                        Channel ordering #4 of audio codec specification
   Remaining values are reserved

omitted_channels_present is a flag that indicates if it is set to 1
that the omittedChannelsMap is present.

omittedChannelsMap is a bit-map of omitted channels; the bits in the
channel map are numbered from
                   least-significant to most-significant, and
correspond in that ordering with the order of the channels
                   for  the  configuration  as  documented  in
ISO/IEC  23091-3  ChannelConfiguration.  1-bits  in  the
                   channel map mean that a channel is absent. A zero
value of the map therefore always means that
                   the given standard layout is fully present. The
default value is 0.

layout_channel_count is the count of channels for the channel layout.
The default value is 0 if stream_
                     structure indicates that no channel structure is
present. Otherwise, the value is the number of
                     channels of the defined layout, if present,
otherwise it is the value from the sample entry.
object_count is the count of channels that contain audio objects. The
default value is 0. For version
             1 and if the objectStructured flag is set, the value is
computed as baseChannelCount  minus the
             channel count of the channel structure.

baseChannelCount represents the combined channel count of the channel
layout and the object count.
                 The value must match the base channel count for DRC
(see ISO/IEC 23003-4).


> +
> +    stream_structure = avio_r8(pb);
> +
> +    // stream carries channels
> +    if (stream_structure & 1) {
> +        int layout = avio_r8(pb);
> +
> +        av_log(c->fc, AV_LOG_TRACE, "'chnl' layout %d\n", layout);
> +        if (!layout) {
> +            uint8_t positions[64] = {};
> +            int enable = 1;
> +
> +            for (int i = 0; i < st->codecpar->ch_layout.nb_channels; i++) {
> +                int speaker_pos = avio_r8(pb);
> +
> +                av_log(c->fc, AV_LOG_TRACE, "speaker_position %d\n", speaker_pos);
> +                if (speaker_pos == 126) { // explicit position
> +                    int16_t azimuth = avio_rb16(pb);
> +                    int8_t elevation = avio_r8(pb);
> +
> +                    av_log(c->fc, AV_LOG_TRACE, "azimuth %d, elevation %d\n",
> +                           azimuth, elevation);
> +                    // Don't support explicit position
> +                    enable = 0;
> +                } else if (i < FF_ARRAY_ELEMS(positions)) {
> +                    positions[i] = speaker_pos;
> +                } else {
> +                    // number of channel out of our supported range
> +                    enable = 0;
> +                }
> +            }
> +
> +            if (enable) {
> +                ret = ff_mov_get_layout_from_channel_positions(positions,
> +                        st->codecpar->ch_layout.nb_channels,
> +                        &st->codecpar->ch_layout);
> +                if (ret) {
> +                    av_log(c->fc, AV_LOG_WARNING, "unsupported speaker positions\n");
> +                    ret = 0;
> +                }
> +            }
> +        } else {
> +            uint64_t omitted_channel_map = avio_rb64(pb);
> +
> +            if (omitted_channel_map) {
> +                avpriv_request_sample(c->fc, "omitted_channel_map 0x%" PRIx64 " != 0",
> +                                      omitted_channel_map);
> +                return AVERROR_PATCHWELCOME;
> +            }
> +            ff_mov_get_channel_layout_from_config(layout, &st->codecpar->ch_layout);
> +        }
> +    }
> +
> +    // stream carries objects
> +    if (stream_structure & 2) {
> +        int obj_count = avio_r8(pb);
> +        av_log(c->fc, AV_LOG_TRACE, "'chnl' with object_count %d\n", obj_count);
> +    }
> +
> +    avio_seek(pb, end, SEEK_SET);
> +    return ret;
> +}
> +
>  static int mov_read_wfex(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>  {
>      AVStream *st;
> @@ -7784,7 +7860,8 @@ static const MOVParseTableEntry mov_default_parse_table[] = {
>  { MKTAG('w','i','d','e'), mov_read_wide }, /* place holder */
>  { MKTAG('w','f','e','x'), mov_read_wfex },
>  { MKTAG('c','m','o','v'), mov_read_cmov },
> -{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout */
> +{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout from quicktime */
> +{ MKTAG('c','h','n','l'), mov_read_chnl }, /* channel layout from ISO-14496-12 */
>  { MKTAG('d','v','c','1'), mov_read_dvc1 },
>  { MKTAG('s','g','p','d'), mov_read_sgpd },
>  { MKTAG('s','b','g','p'), mov_read_sbgp },
> diff --git a/libavformat/mov_chan.c b/libavformat/mov_chan.c
> index f66bf0df7f..10ebcdc08f 100644
> --- a/libavformat/mov_chan.c
> +++ b/libavformat/mov_chan.c
> @@ -551,3 +551,268 @@ int ff_mov_read_chan(AVFormatContext *s, AVIOContext *pb, AVStream *st,
>
>      return 0;
>  }
> +
> +/* ISO/IEC 23001-8, 8.2 */
> +static const AVChannelLayout iso_channel_configuration[] = {
> +    // 0: any setup
> +    {},
> +

I think the better naming for this would be CICP channel configuration
since the specification is called "common independent coding points"
(for video this is shared with ITU-T H.273 which is free).

Also do note that a whole bunch of these are not in the channel order
that FFmpeg wants after stereo :<

Thankfully with manual mapping FFmpeg native channel layouts' channel
order should be writable and readable.

The channel orders for various CICP layouts can be found both in the
referenced specifications, as well as in the comments from Apple's
headers for example

// ISO/IEC 23091-3, channels w/orderings
kAudioChannelLayoutTag_CICP_1                   =
kAudioChannelLayoutTag_MPEG_1_0,      ///< C
kAudioChannelLayoutTag_CICP_2                   =
kAudioChannelLayoutTag_MPEG_2_0,      ///< L R
kAudioChannelLayoutTag_CICP_3                   =
kAudioChannelLayoutTag_MPEG_3_0_A,    ///< L R C
kAudioChannelLayoutTag_CICP_4                   =
kAudioChannelLayoutTag_MPEG_4_0_A,    ///< L R C Cs
kAudioChannelLayoutTag_CICP_5                   =
kAudioChannelLayoutTag_MPEG_5_0_A,    ///< L R C Ls Rs
kAudioChannelLayoutTag_CICP_6                   =
kAudioChannelLayoutTag_MPEG_5_1_A,    ///< L R C LFE Ls Rs
kAudioChannelLayoutTag_CICP_7                   =
kAudioChannelLayoutTag_MPEG_7_1_B,    ///< L R C LFE Ls Rs Lc Rc

kAudioChannelLayoutTag_CICP_9                   =
kAudioChannelLayoutTag_ITU_2_1,       ///< L R Cs
kAudioChannelLayoutTag_CICP_10                  =
kAudioChannelLayoutTag_ITU_2_2,       ///< L R Ls Rs
kAudioChannelLayoutTag_CICP_11                  =
kAudioChannelLayoutTag_MPEG_6_1_A,    ///< L R C LFE Ls Rs Cs
kAudioChannelLayoutTag_CICP_12                  =
kAudioChannelLayoutTag_MPEG_7_1_C,    ///< L R C LFE Ls Rs Rls Rrs
kAudioChannelLayoutTag_CICP_13                  = (204U<<16) | 24,
                   ///< Lc Rc C LFE2 Rls Rrs L R Cs LFE3 Lss Rss Vhl
Vhr Vhc Ts Ltr Rtr Ltm Rtm Ctr Cb Lb Rb

kAudioChannelLayoutTag_CICP_14                  = (205U<<16) | 8,
               ///< L R C LFE Ls Rs Vhl Vhr
kAudioChannelLayoutTag_CICP_15                  = (206U<<16) | 12,
                   ///< L R C LFE2 Rls Rrs LFE3 Lss Rss Vhl Vhr Ctr

kAudioChannelLayoutTag_CICP_16                  = (207U<<16) | 10,
                   ///< L R C LFE Ls Rs Vhl Vhr Lts Rts
kAudioChannelLayoutTag_CICP_17                  = (208U<<16) | 12,
                   ///< L R C LFE Ls Rs Vhl Vhr Vhc Lts Rts Ts
kAudioChannelLayoutTag_CICP_18                  = (209U<<16) | 14,
                   ///< L R C LFE Ls Rs Lbs Rbs Vhl Vhr Vhc Lts Rts Ts

kAudioChannelLayoutTag_CICP_19                  = (210U<<16) | 12,
                   ///< L R C LFE Rls Rrs Lss Rss Vhl Vhr Ltr Rtr
kAudioChannelLayoutTag_CICP_20                  = (211U<<16) | 14,
                   ///< L R C LFE Rls Rrs Lss Rss Vhl Vhr Ltr Rtr Leos
Reos

Best regards,
Jan


More information about the ffmpeg-devel mailing list