[Ffmpeg-devel] channel ordering and downmixing

Sat Apr 7 02:17:29 CEST 2007

Michael Niedermayer wrote:
> Hi
> 
> On Thu, Apr 05, 2007 at 04:47:56AM -0400, Justin Ruggles wrote:
> 
>>Hi,
>>
>>Justin Ruggles wrote:
>>
>>>I need some advice on a channel ordering and downmixing framework.  I've
>>>been trying to figure out the best solution for a while and keep running
>>>into road blocks.
>>
>>After spending too much time just pondering different ideas, I decided
>>to give it a go and try one out.
>>
>>The attached patch isn't really meant to be a working patch, but more of
>>a conceptual sketch of a basic design.  I want to get ideas and comments
>>before taking the time to implement it in full.
>>
>>The general concept is:
>>
>> - add an AVChannelLayout struct to AVCodecContext
>> - have the muxer set its preferred channel layout in read_header()
> 
> 
> muxers dont read headers demuxers do, muxer write headers ...

indeed.

> 
> 
>> - have the decoder override the channel layout if it wants to
>> - user-level API: av_channel_mix_init(), av_channel_mix(),
>>                   av_channel_mix_close()
>> - the encoder can set the channel layout in encode_init or just set the
>>   number of channels and set the mask to CHANNEL_MASK_NONE to let the
>>   muxer decide
>> - if avctx->channel_layout.mask is CHANNEL_MASK_NONE, the muxer should
>>   set the channel layout
> 
> 
> i think the AVCodec encoder should have a list of supported layouts and
> the user app should choose one

Ok.  Then you have muxer support to consider as well.  e.g. pcm codecs
can support any layout, but certain containers only support particular
layouts.  One solution to this might be for the muxer to also have a
list of supported layouts.  Then we could either have the muxing fail if
the codec's layout is non-compatible with the muxer's list or else just
let it be on the user's head of they decide to mix incompatible layouts.

> 
>>Any suggestions/critiques would be great. :)
> 
> 
> id say first get rid of the floats

Downmixing doesn't need very high accuracy, so how does 8-bit
fixed-point sound?  The AC-3 spec gives a suggestion of 6-bit
coeffs...how odd.  Maybe that's in order to fit values >1.0 into an
8-bit integer?

> then there are no ff_/av_ prefixes on non static things

True.  I considered that, but I was mimicking the naming scheme for
codecs, parsers, and bitstream filters.  Is it different in this case
because the channel layouts are const or because of history?

> now to the actual design
> i think downmix coeffs are not a part of the channel layout
> the channel layout is rather location and type of speakers, that could be
> simply right, left, front, ... or x,y / x,y,z coordinates or direction in
> radians or something
> 
> from that you can then somehow ;) find the default mixing coeffs to convert
> from layout X to Y, hardcoding them all is not a good idea, as there are too
> many as soon as you consider more then mono / stereo as target

One issue here is that the decoder should be able to specify downmixing
coeffs to the user based on codec-specific defaults and/or values in the
bitstream.  The only thread-safe way I can think of to do this is to put
them in the AVCodecContext.

Another tricky thing is that the values of the coeffs depend on the
channel layout being downmixed to.  I don't see how the decoder can know
this without putting both src_channel_layout and dst_channel_layout into
AVCodecContext or having the decoder provide all sets of coeffs for only
certain target channel layouts.  Either way could get very messy.  That
is why I had limited the downmix destination layout to only mono or
stereo...it's much simpler.  We could do a compromise solution of only
letting the decoder set the coeffs for mono or stereo downmixing and
using either preset coeffs or ones calculated from speaker locations for
other conversions.

> i also would strongly suggest to make AVChannelLayout a pointer in
> AVCodecContext, otherwise we can NOT add a field to AVChannelLayout
> in the future
> this also allows us to simple make this point to a static const
> AVChannelLayout

Using static const AVChannelLayout would definitely be preferred if we
do the decoder-defined or user-defined downmixing coeffs separately.

> one choice would be
> 
> typedef struct AVChannelLayout{
>     /**
>      * direction (0 is front, 1<16 is right, 2<<16 is back, 3<<16 is left).
>      * the number of channels can be found in AVCodecContext.channels
>      */
>     int *azimuth;
>     /**
>      * direction (0 is front, 1<16 is up, -1<<16 is down).
>      * the number of channels can be found in AVCodecContext.channels
>      */
>     int *elevation;
> }AVChannelLayout;
> 
> anoter would be
> 
> typedef struct AVChannelLayout{
>     /**
>      * x(-left->+right), y(-back->+front), z(-down->+up) position.
>      * the number of channels can be found in AVCodecContext.channels
>      */
>     int (*position)[3];
> }AVChannelLayout;

This might not be so bad if each channel is defined by both a label
(using an enum value) and coordinates.  That way downmixing can be done
using predefined coeffs for predefined channels or by using the speaker
locations.  Using labels would make reordering easier, too.  Also,
including a channel mask would make it simpler to determine which
channels are present.  However, if we get rid of the channel mask, it
would be possible to add other channel labels in the future...or we
could do like CAFF and keep two separate lists of channels, one for the
label and one for the mask...bleh.

typedef struct AVChannelDescription {
    /**
     * predefined channel label, from enum ChannelLabel
     */
    int label;

    /**
     * speaker position, [x][y][z] in ?? units
     */
    int position[3];
} AVChannelDescription;

typedef struct AVChannelLayout {
    int mask;
    AVChannelDescription *description;
} AVChannelLayout

I have no idea how to do downmixing based on speaker coordinates.  From
what little I've read it involves using physics formulas, different
kinds of filters, and other math which I don't have the desire to delve
into right now.  But maybe there are simpler solutions I don't know of
that would be good enough for our purposes...

> [...]
> 
> 
>>+const AVChannelLayout mono_channel_layout = {
>>+    .channels = 1,
>>+    .mask     = CHANNEL_MASK_MONO;
>>+    .layout   = { CHANNEL_CENTER, },
>>+    .mono_downmix   =   { 1.000, },
>>+    .stereo_downmix = { { 1.000, },
>>+                        { 1.000, } },
> 
> 
> this looks wrong IMHO, 2 speakers sound louder then 1 so it shouldnt be
> all 1.000

You're right.  Keeping with the same scheme I was using, both should be
0.707.

-Justin