[FFmpeg-devel] [PATCH 4/6] avformat/mov: parse ISO-14496-12 ChannelLayout

Zhao Zhili quinkblack at foxmail.com
Tue Oct 31 05:15:36 EET 2023



> On Feb 24, 2023, at 21:49, Jan Ekström <jeebjp at gmail.com> wrote:
> 
> On Fri, Feb 24, 2023 at 6:25 AM Zhao Zhili <quinkblack at foxmail.com <mailto:quinkblack at foxmail.com>> wrote:
>> 
>> From: Zhao Zhili <zhilizhao at tencent.com <mailto:zhilizhao at tencent.com>>
>> 
>> Signed-off-by: Zhao Zhili <zhilizhao at tencent.com <mailto:zhilizhao at tencent.com>>
> 
> Hah, I actually happened to recently start coding uncompressed audio
> support in mp4 myself, but what this commit is handling is what
> basically killed my version off since the channel layout box is
> required.
> 
> If you're interested you can check my take over at
> https://github.com/jeeb/ffmpeg/commits/pcmc_parsing_improvements .
> 
> Will comment on some things.

I only have an old copy of the spec, and I may have missed some comments
and made some mistakes. Please notify me in mailing list or personal email
(this one) if I didn’t something wrong.

I have network issue with IRC, can only read the archives if I get the time.
I don’t work on open source for daily jobs.

> 
>> ---
>> libavformat/mov.c      |  79 +++++++++++-
>> libavformat/mov_chan.c | 265 +++++++++++++++++++++++++++++++++++++++++
>> libavformat/mov_chan.h |  26 ++++
>> 3 files changed, 369 insertions(+), 1 deletion(-)
>> 
>> diff --git a/libavformat/mov.c b/libavformat/mov.c
>> index b125343f84..1db869aa2e 100644
>> --- a/libavformat/mov.c
>> +++ b/libavformat/mov.c
>> @@ -940,6 +940,82 @@ static int mov_read_chan(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>>     return 0;
>> }
>> 
>> +static int mov_read_chnl(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>> +{
>> +    int64_t end = av_sat_add64(avio_tell(pb), atom.size);
>> +    int stream_structure;
>> +    int ret = 0;
>> +    AVStream *st;
>> +
>> +    if (c->fc->nb_streams < 1)
>> +        return 0;
>> +    st = c->fc->streams[c->fc->nb_streams-1];
>> +
>> +    /* skip version and flags */
>> +    avio_skip(pb, 4);
> 
> We should really not do this any more. Various FullBoxes have multiple
> versions or depend on the flags. See how I have added FullBox things
> recently, although I would prefer us to have a generic macro/function
> setup for this where you then get the version and flags as arguments
> or whatever in the future.
> 
> For this specific box, there are now versions 0 and 1 defined since
> circa 2018-2019 or so (visible at least in 14496-12 2022)
> 
> Since ISO/IEC has changed the rules for free specifications (against
> the wishes of various spec authors) and all that jazz, this is how
> it's defined in what I have on hand:
> 
> 12.2.4  Channel layout
> 
> 12.2.4.1  Definition
> 
> Box Types:  'chnl'
> Container: Audio sample entry
> Mandatory: No
> Quantity: Zero or one
> 
> This box may appear in an audio sample entry to document the
> assignment of channels in the audio
> stream. It is recommended to use this box to convey the base channel
> count for the DownMixInstructions
> box and other DRC-related boxes specified in ISO/IEC 23003-4.
> The channel layout can be all or part of a standard layout (from an
> enumerated list), or a custom layout
> (which also allows a track to contribute part of an overall layout).
> A stream may contain channels, objects, neither, or both. A stream
> that is neither channel nor object
> structured can implicitly be rendered in a variety of ways.
> 
> 12.2.4.2  Syntax
> 
> aligned(8) class ChannelLayout extends FullBox('chnl', version, flags=0) {
>   if (version==0) {
>      unsigned int(8) stream_structure;
>      if (stream_structure & channelStructured) {
>         unsigned int(8) definedLayout;
>          if (definedLayout==0) {
>            for (i = 1 ; i <= layout_channel_count ; i++) {
>               //  layout_channel_count comes from the sample entry
>               unsigned int(8) speaker_position;
>               if (speaker_position == 126) {   // explicit position
>                  signed int (16) azimuth;
>                  signed int (8)  elevation;
>               }
>            }
>         } else {
>            unsigned int(64)   omittedChannelsMap;
>                  // a ‘1’ bit indicates ‘not in this track’
>         }
>      }
>      if (stream_structure & objectStructured) {
>         unsigned int(8) object_count;
>      }
>   } else {
>      unsigned int(4) stream_structure;
>      unsigned int(4) format_ordering;
>      unsigned int(8) baseChannelCount;
>      if (stream_structure & channelStructured) {
>         unsigned int(8) definedLayout;
>         if (definedLayout==0) {
>            unsigned int(8) layout_channel_count;
>            for (i = 1 ; i <= layout_channel_count ; i++) {
>               unsigned int(8) speaker_position;
>               if (speaker_position == 126) {   // explicit position
>                  signed int (16) azimuth;
>                  signed int (8)  elevation;
>               }
>            }
>         } else {
>            int(4) reserved = 0;
>            unsigned int(3) channel_order_definition;
>            unsigned int(1) omitted_channels_present;
>            if (omitted_channels_present == 1) {
>               unsigned int(64)   omittedChannelsMap;
>                     // a ‘1’ bit indicates ‘not in this track’
>            }
>         }
>      }
>      if (stream_structure & objectStructured) {
>                     // object_count is derived from baseChannelCount
>      }
>   }
> }
> 
> 12.2.4.3  Semantics
> 
> version is an integer that specifies the version of this box (0 or 1).
> When authoring, version 1 should be
>        preferred over version 0. Version 1 conveys the channel
> ordering, which is not always the case for
>        version 0. Version 1 should be used to convey the base channel
> count for DRC.
> 
> stream_structure is a field of flags that define whether the stream
> has channel or object structure (or
>                 both, or neither); the following flags are defined,
> all other values are reserved:
>   1  the stream carries channels
>   2  the stream carries objects
> 
> format_ordering indicates the order of formats in the stream starting
> from the lowest channel index
>                (see Table). Each format shall only use contiguous
> channel indices.
>   format_ordering Order
>   0               unknown
>   1               Channels, possibly followed by Objects
>   2               Objects, possibly followed by Channels
>   Remaining values are reserved
> 
> definedLayout is a ChannelConfiguration from ISO/IEC 23091-3.
> 
> speaker_position is an OutputChannelPosition from ISO/IEC 23091-3. If
> an explicit position is used,
>                 then the azimuth and elevation are as defined as for
> speakers in ISO/IEC 23091-3. The channel
>                 order corresponds to the order of speaker positions.
> 
> azimuth is a signed value in degrees, as defined for
> LoudspeakerAzimuth in ISO/IEC 23091-3.
> 
> elevation is a signed value, in degrees, as defined for
> LoudspeakerElevation in ISO/IEC 23091-3.
> 
> channel_order_definition indicates where the ordering of the audio
> channels for the definedLayout
>                         are specified (see Table).
> 
>   channel_order_definition Channel order specification
>   0                        as listed for the ChannelConfigurations in
> ISO/IEC 23091-3
>   1                        Default order of audio codec specification
>   2                        Channel ordering #2 of audio codec specification
>   3                        Channel ordering #3 of audio codec specification
>   4                        Channel ordering #4 of audio codec specification
>   Remaining values are reserved
> 
> omitted_channels_present is a flag that indicates if it is set to 1
> that the omittedChannelsMap is present.
> 
> omittedChannelsMap is a bit-map of omitted channels; the bits in the
> channel map are numbered from
>                   least-significant to most-significant, and
> correspond in that ordering with the order of the channels
>                   for  the  configuration  as  documented  in
> ISO/IEC  23091-3  ChannelConfiguration.  1-bits  in  the
>                   channel map mean that a channel is absent. A zero
> value of the map therefore always means that
>                   the given standard layout is fully present. The
> default value is 0.
> 
> layout_channel_count is the count of channels for the channel layout.
> The default value is 0 if stream_
>                     structure indicates that no channel structure is
> present. Otherwise, the value is the number of
>                     channels of the defined layout, if present,
> otherwise it is the value from the sample entry.
> object_count is the count of channels that contain audio objects. The
> default value is 0. For version
>             1 and if the objectStructured flag is set, the value is
> computed as baseChannelCount  minus the
>             channel count of the channel structure.
> 
> baseChannelCount represents the combined channel count of the channel
> layout and the object count.
>                 The value must match the base channel count for DRC
> (see ISO/IEC 23003-4).
> 
> 
>> +
>> +    stream_structure = avio_r8(pb);
>> +
>> +    // stream carries channels
>> +    if (stream_structure & 1) {
>> +        int layout = avio_r8(pb);
>> +
>> +        av_log(c->fc, AV_LOG_TRACE, "'chnl' layout %d\n", layout);
>> +        if (!layout) {
>> +            uint8_t positions[64] = {};
>> +            int enable = 1;
>> +
>> +            for (int i = 0; i < st->codecpar->ch_layout.nb_channels; i++) {
>> +                int speaker_pos = avio_r8(pb);
>> +
>> +                av_log(c->fc, AV_LOG_TRACE, "speaker_position %d\n", speaker_pos);
>> +                if (speaker_pos == 126) { // explicit position
>> +                    int16_t azimuth = avio_rb16(pb);
>> +                    int8_t elevation = avio_r8(pb);
>> +
>> +                    av_log(c->fc, AV_LOG_TRACE, "azimuth %d, elevation %d\n",
>> +                           azimuth, elevation);
>> +                    // Don't support explicit position
>> +                    enable = 0;
>> +                } else if (i < FF_ARRAY_ELEMS(positions)) {
>> +                    positions[i] = speaker_pos;
>> +                } else {
>> +                    // number of channel out of our supported range
>> +                    enable = 0;
>> +                }
>> +            }
>> +
>> +            if (enable) {
>> +                ret = ff_mov_get_layout_from_channel_positions(positions,
>> +                        st->codecpar->ch_layout.nb_channels,
>> +                        &st->codecpar->ch_layout);
>> +                if (ret) {
>> +                    av_log(c->fc, AV_LOG_WARNING, "unsupported speaker positions\n");
>> +                    ret = 0;
>> +                }
>> +            }
>> +        } else {
>> +            uint64_t omitted_channel_map = avio_rb64(pb);
>> +
>> +            if (omitted_channel_map) {
>> +                avpriv_request_sample(c->fc, "omitted_channel_map 0x%" PRIx64 " != 0",
>> +                                      omitted_channel_map);
>> +                return AVERROR_PATCHWELCOME;
>> +            }
>> +            ff_mov_get_channel_layout_from_config(layout, &st->codecpar->ch_layout);
>> +        }
>> +    }
>> +
>> +    // stream carries objects
>> +    if (stream_structure & 2) {
>> +        int obj_count = avio_r8(pb);
>> +        av_log(c->fc, AV_LOG_TRACE, "'chnl' with object_count %d\n", obj_count);
>> +    }
>> +
>> +    avio_seek(pb, end, SEEK_SET);
>> +    return ret;
>> +}
>> +
>> static int mov_read_wfex(MOVContext *c, AVIOContext *pb, MOVAtom atom)
>> {
>>     AVStream *st;
>> @@ -7784,7 +7860,8 @@ static const MOVParseTableEntry mov_default_parse_table[] = {
>> { MKTAG('w','i','d','e'), mov_read_wide }, /* place holder */
>> { MKTAG('w','f','e','x'), mov_read_wfex },
>> { MKTAG('c','m','o','v'), mov_read_cmov },
>> -{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout */
>> +{ MKTAG('c','h','a','n'), mov_read_chan }, /* channel layout from quicktime */
>> +{ MKTAG('c','h','n','l'), mov_read_chnl }, /* channel layout from ISO-14496-12 */
>> { MKTAG('d','v','c','1'), mov_read_dvc1 },
>> { MKTAG('s','g','p','d'), mov_read_sgpd },
>> { MKTAG('s','b','g','p'), mov_read_sbgp },
>> diff --git a/libavformat/mov_chan.c b/libavformat/mov_chan.c
>> index f66bf0df7f..10ebcdc08f 100644
>> --- a/libavformat/mov_chan.c
>> +++ b/libavformat/mov_chan.c
>> @@ -551,3 +551,268 @@ int ff_mov_read_chan(AVFormatContext *s, AVIOContext *pb, AVStream *st,
>> 
>>     return 0;
>> }
>> +
>> +/* ISO/IEC 23001-8, 8.2 */
>> +static const AVChannelLayout iso_channel_configuration[] = {
>> +    // 0: any setup
>> +    {},
>> +
> 
> I think the better naming for this would be CICP channel configuration
> since the specification is called "common independent coding points"
> (for video this is shared with ITU-T H.273 which is free).
> 
> Also do note that a whole bunch of these are not in the channel order
> that FFmpeg wants after stereo :<
> 
> Thankfully with manual mapping FFmpeg native channel layouts' channel
> order should be writable and readable.
> 
> The channel orders for various CICP layouts can be found both in the
> referenced specifications, as well as in the comments from Apple's
> headers for example
> 
> // ISO/IEC 23091-3, channels w/orderings
> kAudioChannelLayoutTag_CICP_1                   =
> kAudioChannelLayoutTag_MPEG_1_0,      ///< C
> kAudioChannelLayoutTag_CICP_2                   =
> kAudioChannelLayoutTag_MPEG_2_0,      ///< L R
> kAudioChannelLayoutTag_CICP_3                   =
> kAudioChannelLayoutTag_MPEG_3_0_A,    ///< L R C
> kAudioChannelLayoutTag_CICP_4                   =
> kAudioChannelLayoutTag_MPEG_4_0_A,    ///< L R C Cs
> kAudioChannelLayoutTag_CICP_5                   =
> kAudioChannelLayoutTag_MPEG_5_0_A,    ///< L R C Ls Rs
> kAudioChannelLayoutTag_CICP_6                   =
> kAudioChannelLayoutTag_MPEG_5_1_A,    ///< L R C LFE Ls Rs
> kAudioChannelLayoutTag_CICP_7                   =
> kAudioChannelLayoutTag_MPEG_7_1_B,    ///< L R C LFE Ls Rs Lc Rc
> 
> kAudioChannelLayoutTag_CICP_9                   =
> kAudioChannelLayoutTag_ITU_2_1,       ///< L R Cs
> kAudioChannelLayoutTag_CICP_10                  =
> kAudioChannelLayoutTag_ITU_2_2,       ///< L R Ls Rs
> kAudioChannelLayoutTag_CICP_11                  =
> kAudioChannelLayoutTag_MPEG_6_1_A,    ///< L R C LFE Ls Rs Cs
> kAudioChannelLayoutTag_CICP_12                  =
> kAudioChannelLayoutTag_MPEG_7_1_C,    ///< L R C LFE Ls Rs Rls Rrs
> kAudioChannelLayoutTag_CICP_13                  = (204U<<16) | 24,
>                   ///< Lc Rc C LFE2 Rls Rrs L R Cs LFE3 Lss Rss Vhl
> Vhr Vhc Ts Ltr Rtr Ltm Rtm Ctr Cb Lb Rb
> 
> kAudioChannelLayoutTag_CICP_14                  = (205U<<16) | 8,
>               ///< L R C LFE Ls Rs Vhl Vhr
> kAudioChannelLayoutTag_CICP_15                  = (206U<<16) | 12,
>                   ///< L R C LFE2 Rls Rrs LFE3 Lss Rss Vhl Vhr Ctr
> 
> kAudioChannelLayoutTag_CICP_16                  = (207U<<16) | 10,
>                   ///< L R C LFE Ls Rs Vhl Vhr Lts Rts
> kAudioChannelLayoutTag_CICP_17                  = (208U<<16) | 12,
>                   ///< L R C LFE Ls Rs Vhl Vhr Vhc Lts Rts Ts
> kAudioChannelLayoutTag_CICP_18                  = (209U<<16) | 14,
>                   ///< L R C LFE Ls Rs Lbs Rbs Vhl Vhr Vhc Lts Rts Ts
> 
> kAudioChannelLayoutTag_CICP_19                  = (210U<<16) | 12,
>                   ///< L R C LFE Rls Rrs Lss Rss Vhl Vhr Ltr Rtr
> kAudioChannelLayoutTag_CICP_20                  = (211U<<16) | 14,
>                   ///< L R C LFE Rls Rrs Lss Rss Vhl Vhr Ltr Rtr Leos
> Reos
> 
> Best regards,
> Jan
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org <mailto:ffmpeg-devel at ffmpeg.org>
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org <mailto:ffmpeg-devel-request at ffmpeg.org> with subject "unsubscribe".



More information about the ffmpeg-devel mailing list