[FFmpeg-devel] AAC decoder round 6

Mon Aug 11 00:11:21 CEST 2008

2008/8/10 Michael Niedermayer <michaelni at gmx.at>:
> On Sun, Aug 10, 2008 at 08:27:47PM +0100, Robert Swain wrote:
>> 2008/8/10 Michael Niedermayer <michaelni at gmx.at>:
>> > On Sun, Aug 10, 2008 at 12:31:31PM +0100, Robert Swain wrote:
>> >> 2008/8/9 Michael Niedermayer <michaelni at gmx.at>:
>> >> > On Sat, Aug 09, 2008 at 12:25:09PM +0100, Robert Swain wrote:
>> >> >> $subj
>> >> > [...]
>> >> >
>> >> >> +const uint8_t ff_aac_num_swb_1024[] = {
>> >> >> +    41, 41, 47, 49, 49, 51, 47, 47, 43, 43, 43, 40
>> >> >> +};
>> >> >> +
>> >> >> +const uint8_t ff_aac_num_swb_128[] = {
>> >> >> +    12, 12, 12, 14, 14, 14, 15, 15, 15, 15, 15, 15
>> >> >> +};
>> >> >> +
>> >> >
>> >> > dont these 2 belong to the swb offset tables? if so i think they should
>> >> > be in the same file unless iam missing something either way these 2 are
>> >> > ok [with static if they are only used by 1 file]
>> >>
>> >> Kostya wanted these two tables but not the rest of the swb tables so I
>> >> made these shared and the others not. I can move them if you wish.
>> >
>> > please add a note or move the tables
>>
>> A note saying that they're going to be used by other files? Isn't that
>> obvious from them being in the file containing shared declarations?
>
> no, globals that are used by just one file are a reason to reject a patch
> if i dont i have missed the fact.
>
> with no note i have to remember which tables kostya wants to use, so i know
> what is ok because a future patch will need it and what is not ok ...

Done. Hopefully one note before the tables in this file is
satisfactory to you. I promise I will move out any tables that are not
used by other files but Kostya specifically told me he needed these,
hence putting them in the file in the first place.

>> None of the other tables in aactab.h have any notes saying that
>> they're shared.
>>
>> > [...]
>> >>
>> >> > [...]
>> >> >> +
>> >> >> +/**
>> >> >> + * Decode GA "General Audio" specific configuration; reference: table 4.1.
>> >> >> + *
>> >> >> + * @return  Returns error status. 0 - OK, !0 - error
>> >> >> + */
>> >> >> +static int decode_ga_specific_config(AACContext * ac, GetBitContext * gb, int channel_config) {
>> >> >> +    enum ChannelPosition new_che_pos[4][MAX_ELEM_ID];
>> >> >> +    int extension_flag, ret;
>> >> >> +
>> >> >> +    if(get_bits1(gb)) {  // frameLengthFlag
>> >> >> +        av_log(ac->avccontext, AV_LOG_ERROR, "960/120 MDCT window is not supported.\n");
>> >> >
>> >> > is this "update to latest svn; sample welcome or patch welcome" ?
>> >> > i think we have a function to print such messages, i dont remember what its
>> >> > name was though
>> >>
>> >> Is that a joke? I don't really understand what you're trying to point
>> >> out. It's more "These files don't really occur in the wild and we
>> >> haven't written any support for them."
>> >
>> > [FFmpeg-devel] [PATCH 1/4] add a generic function to lavc to log messages about missing features.
>> >
>> > I approved the patch but it apparently was not applied
>>
>> Hmm, OK. I didn't know about this. It can be changed as soon as it is
>> committed. There are other unsupported features that can also use
>> this.
>
> just commit it and use it

Committed and used in all but one place where I need to pass
additional arguments to be replaced in the string. The
av_log_missing_feature() function could do with being extended a
little to accept the usual printf style strings + arguments if it's
not too difficult. Else I can just use something dirty like:

av_log(ac->avccontext, AV_LOG_WARNING, "Audio object type %s%d",
    ac->m4ac.sbr == 1? "SBR+" : "", ac->m4ac.object_type);
av_log_missing_feature(ac->avccontext, " is", 1);

:)

>> > [...]
>> >> >> +                ics->group_len[ics->num_window_groups-1]++;
>> >> >> +            } else {
>> >> >> +                ics->num_window_groups++;
>> >> >> +                ics->group_len[ics->num_window_groups-1] = 1;
>> >> >> +            }
>> >> >> +        }
>> >> >
>> >> >
>> >> >> +        ics->swb_offset =     swb_offset_128[ac->m4ac.sampling_index];
>> >> >> +        ics->num_swb    = ff_aac_num_swb_128[ac->m4ac.sampling_index];
>> >> >> +    } else {
>> >> >> +        ics->max_sfb    =                              get_bits(gb, 6);
>> >> >> +        ics->swb_offset =     swb_offset_1024[ac->m4ac.sampling_index];
>> >> >> +        ics->num_swb    = ff_aac_num_swb_1024[ac->m4ac.sampling_index];
>> >> >> +    }
>> >> >
>> >> > is there something that prevents sampling_index from being >=12 ?
>> >> > a quick look at ff_mpeg4audio_get_config() indicates that it can at least
>> >> > be 15.
>> >>
>> >> It's checked in decode_pce() but that is only called if the channel
>> >> config changes. I've added another check after the call to
>> >> ff_mpeg4audio_get_config() so now it is checked in both places where a
>> >> sampling index is parsed.
>> >>
>> >> >> +
>> >> >> +    if(ics->max_sfb > ics->num_swb) {
>> >> >> +        av_log(ac->avccontext, AV_LOG_ERROR,
>> >> >> +            "Number of scalefactor bands in group (%d) exceeds limit (%d).\n",
>> >> >> +            ics->max_sfb, ics->num_swb);
>> >> >> +        ics->max_sfb = 0;
>> >> >> +        ics->num_swb = 0;
>> >> >> +        return -1;
>> >> >> +    }
>> >> >> +
>> >> >
>> >> >> +    if (ics->window_sequence[0] == EIGHT_SHORT_SEQUENCE) {
>> >> >> +        ics->num_windows   = 8;
>> >> >> +        ics->tns_max_bands = tns_max_bands_128[ac->m4ac.sampling_index];
>> >> >> +    } else {
>> >> >> +        ics->num_windows   = 1;
>> >> >> +        ics->tns_max_bands = tns_max_bands_1024[ac->m4ac.sampling_index];
>> >> >> +        if (get_bits1(gb)) {
>> >> >> +            av_log(ac->avccontext, AV_LOG_ERROR,
>> >> >> +                   "Predictor bit set but LTP is not supported.\n");
>> >> >> +            return -1;
>> >> >> +        }
>> >> >> +    }
>> >> >
>> >> > why is this split from the first part?
>> >>
>> >> It was split either side of the refactored max_sfb/num_swb validation.
>> >> I can merge them but then if the max_sfb/num_swb check fails, should
>> >> these values also be zeroed?
>> >
>> > The way its done currently can leave the num_windows / tns_max_bands in
>> > any state that was valid in the past combined with the other fields
>> > in any independant state that was not valid now.
>> > I did not check if this can cause problems ...
>> > But if its left the code needs to be reviewed for possible security issues
>> > related to invalid combinations of the fields.
>>
>> Hmm. I'm tempted to memset ics to 0 in the error cases.
>
> iam not against that ...

Committed.

>> >> > [...]
>> >> >
>> >> >> @@ -330,10 +790,286 @@
>> >> >>  }
>> >> >>
>> >> >>  /**
>> >> >> - * Parse Spectral Band Replication extension data; reference: table 4.55.
>> >> >> + * Dequantize and scale spectral data; reference: 4.6.3.3.
>> >> >>   *
>> >> >> + * @param   icoef       array of quantized spectral data
>> >> >> + * @param   band_type   array of the used band type
>> >> >> + * @param   sf          array of scalefactors or intensity stereo positions
>> >> >> + * @param   coef        array of dequantized, scaled spectral data
>> >> >> + */
>> >> >> +static void dequant(AACContext * ac, float coef[1024], const int icoef[1024], float sf[120],
>> >> >> +        const IndividualChannelStream * ics, enum BandType band_type[120]) {
>> >> >> +    const uint16_t * offsets = ics->swb_offset;
>> >> >> +    const int c = 1024/ics->num_window_groups;
>> >> >> +    int g, i, group, k, idx = 0;
>> >> >> +
>> >> >> +    for (g = 0; g < ics->num_window_groups; g++) {
>> >> >> +        memset(coef + g * 128 + offsets[ics->max_sfb], 0, sizeof(float)*(c - offsets[ics->max_sfb]));
>> >> >> +        for (i = 0; i < ics->max_sfb; i++, idx++) {
>> >> >> +            if (band_type[idx] == NOISE_BT) {
>> >> >> +                const float scale = sf[idx] / ((offsets[i+1] - offsets[i]) * PNS_MEAN_ENERGY);
>> >> >> +                for (group = 0; group < ics->group_len[g]; group++) {
>> >> >> +                    for (k = offsets[i]; k < offsets[i+1]; k++)
>> >> >> +                        coef[group*128+k] = lcg_random(&ac->random_state) * scale;
>> >> >> +                }
>> >> >> +            } else if (band_type[idx] != INTENSITY_BT && band_type[idx] != INTENSITY_BT2) {
>> >> >> +                for (group = 0; group < ics->group_len[g]; group++) {
>> >> >> +                    for (k = offsets[i]; k < offsets[i+1]; k++) {
>> >> >> +                        coef[group*128+k] = ivquant(icoef[group*128+k]) * sf[idx];
>> >> >> +                    }
>> >> >> +                }
>> >> >> +            }
>> >> >> +        }
>> >> >> +        coef  += ics->group_len[g]*128;
>> >> >> +        icoef += ics->group_len[g]*128;
>> >> >> +    }
>> >> >> +}
>> >> >
>> >> > it might be better to add 128 just outside the innermost loop to avoid
>> >> > the "group*128 +" inside
>> >>
>> >> They would need to be reset the end of each iteration over i. I moved
>> >> the group loops inside the conditions recently as I recall. What do
>> >> you suggest? Add 128 to coef and icoef through each iteration over
>> >> group and then reset?
>> >
>> > I suggest you benchmark it, if it makes no speed difference they can stay
>> > as they are.
>>
>> OK. On a Core Duo 1.83GHz, 5 repetitions interleaved...
>>
>> Unpatched:
>> real  0m8.697s
>> user  0m8.052s
>> sys   0m0.494s
>
> and START/STOP_TIMER over the function?

It's done in a number of functions. But here are the results:

Unpatched:
2200 dezicycles in decode_spectrum, 4 runs, 8188 skips
2200 dezicycles in decode_spectrum, 4 runs, 16380 skips
2200 dezicycles in decode_spectrum, 4 runs, 32764 skips
2200 dezicycles in decode_spectrum, 4 runs, 65532 skips

116572 dezicycles in dequant, 8186 runs, 6 skips
116390 dezicycles in dequant, 16371 runs, 13 skips
114657 dezicycles in dequant, 32741 runs, 27 skips
107620 dezicycles in dequant, 65499 runs, 37 skips

102380 dezicycles in apply_mid_side_stereo, 4094 runs, 2 skips
102210 dezicycles in apply_mid_side_stereo, 8188 runs, 4 skips
101466 dezicycles in apply_mid_side_stereo, 16365 runs, 19 skips
96039 dezicycles in apply_mid_side_stereo, 32745 runs, 23 skips

Patched:
2282 dezicycles in decode_spectrum, 4 runs, 8188 skips
2282 dezicycles in decode_spectrum, 4 runs, 16380 skips
2282 dezicycles in decode_spectrum, 4 runs, 32764 skips
2282 dezicycles in decode_spectrum, 4 runs, 65532 skips

132744 dezicycles in dequant, 8191 runs, 1 skips
133292 dezicycles in dequant, 16380 runs, 4 skips
130422 dezicycles in dequant, 32758 runs, 10 skips
124137 dezicycles in dequant, 65515 runs, 21 skips

106972 dezicycles in apply_mid_side_stereo, 4095 runs, 1 skips
107052 dezicycles in apply_mid_side_stereo, 8188 runs, 4 skips
105295 dezicycles in apply_mid_side_stereo, 16376 runs, 8 skips
100982 dezicycles in apply_mid_side_stereo, 32749 runs, 19 skips

I guess it didn't like decode_spectrum() but regardless it looks like
it's faster without the change. Did I implement it poorly?

> [...]
>> > [...]
>> >> >> +    }
>> >> >> +    domain = get_bits1(gb);
>> >> >> +
>> >> >> +    if (is_indep_coup) {
>> >> >> +        coup->coupling_point = AFTER_IMDCT;
>> >> >> +    } else if(domain) {
>> >> >> +        coup->coupling_point = BETWEEN_TNS_AND_IMDCT;
>> >> >> +    } else
>> >> >> +        coup->coupling_point = BEFORE_TNS;
>> >> >
>> >> > :/ i thought the bits were at least stored together, what morons designed
>> >> > this!?
>> >> >
>> >> > coup->coupling_point= get_bits1(gb);
>> >> > ...
>> >> > coup->coupling_point+= 2*get_bits1(gb);
>> >> >
>> >> > still is a possibility though that is cleaner
>> >> > btw, what is the 4th possibility for?
>> >>
>> >> is_indep_coup = 1 overrides whatever domain is so the 4th case is
>> >> redundant. Though theoretically, domain must be 1 if is_indep_coup is
>> >> 1. is_indep_coup = 1 and domain = 0 is theoretically invalid.
>> >>
>> >> coup->coupling_point = 2*get_bits1(gb);
>> >> ...
>> >> coup->coupling_point += get_bits1(gb) && !coup->coupling_point;
>> >> ?
>> >
>> > hmm, this seems unneccesarily complex
>> > and the invalid value should be checked for with av_log&return -1
>>
>> It's not invalid in that it should be checked and cause an error. It's
>> just that the value of domain is ignored when is_indep_coup is set. I
>
> Where does the spec say this?
> If the spec says that one bit switches between doing it before and after A
> and says the second switches before and after B then the order of A and B
> implicate which combination is invalid.

ind_sw_cce_flag
one bit indicating whether the coupled target syntax element is an
independently switched (1) or a dependently switched (0).

cc_domain
one bit indicating whether the coupling is performed before (0) or after (1)
the TNS decoding of the coupled target channels

In 4.6.8.3.3 Decoding process (4.6.8.3 is Coupling channel) it states that:

"the independently switched CCE must be decoded all the way to the
time domain (i.e. including the synthesis filterbank) before it is
scaled and added onto the various SCE and CPE channels that it is
coupled to in the case that window state does not match."

"A dependently switched CCE, on the other hand, must have a window
state that matches all of the target SCE and CPE channels that it is
coupled onto as determined by the list of cc_l and cc_r elements. In
this case, the CCE only needs to be decoded as far as the frequency
domain and then scaled as directed by the gain list before it is added
to the target SCE or CPE channels."

So, the spec seems to suggest that one can couple an independently
switched CCE in the frequency domain in the case that the window state
matches. In these cases, I suppose the cc_domain variable would be of
consequence but otherwise, when coupling in the time domain for
independently switched CCEs, cc_domain should really always be 1 as
coupling would always be conducted after TNS decoding. Can all
encoders be trusted to write the correct value? :)

What do you want me to do?

>> said it was theoretically invalid to have is_indep_coup = 1 and domain
>> = 0 as is_indep_coup => coupling occurs after imdct => domain = 1. But
>> in practice, domain is ignored when is_indep_coup = 1.
>>
>> So, it's just whether you prefer the if() {} else if() {} else {} or
>> what I suggested above. Or something else.
>>
>> > [...]
>> >> >> +
>> >> >> +static int aac_decode_frame(AVCodecContext * avccontext, void * data, int * data_size, const uint8_t * buf, int buf_size) {
>> >> >> +    AACContext * ac = avccontext->priv_data;
>> >> >> +    GetBitContext gb;
>> >> >> +    enum RawDataBlockType elem_type;
>> >> >> +    int err, elem_id;
>> >> >> +
>> >> >> +    init_get_bits(&gb, buf, buf_size*8);
>> >> >> +
>> >> >> +    // parse
>> >> >> +    while ((elem_type = get_bits(&gb, 3)) != TYPE_END) {
>> >> >> +        elem_id = get_bits(&gb, 4);
>> >> >> +        err = -1;
>> >> >> +        switch (elem_type) {
>> >> >> +
>> >> >> +        case TYPE_SCE:
>> >> >> +            if(!ac->che[TYPE_SCE][elem_id]) {
>> >> >> +                if(elem_id == 1 && ac->che[TYPE_LFE][0]) {
>> >> >> +                    /* Some streams incorrectly code 5.1 audio as SCE[0] CPE[0] CPE[1] SCE[1]
>> >> >> +                       instead of SCE[0] CPE[0] CPE[0] LFE[0].
>> >> >> +                       If we seem to have encountered such a stream,
>> >> >> +                       transfer the LFE[0] element to SCE[1] */
>> >> >> +                    ac->che[TYPE_SCE][elem_id] = ac->che[TYPE_LFE][0];
>> >> >> +                    ac->che[TYPE_LFE][0] = NULL;
>> >> >> +                } else
>> >> >> +                    break;
>> >> >> +            }
>> >> >> +            err = decode_ics(ac, &ac->che[TYPE_SCE][elem_id]->ch[0], &gb, 0, 0);
>> >> >> +            break;
>> >> >> +
>> >> >> +        case TYPE_CPE:
>> >> >> +            if (ac->che[TYPE_CPE][elem_id])
>> >> >> +                err = decode_cpe(ac, &gb, elem_id);
>> >> >> +            break;
>> >> >> +
>> >> >> +        case TYPE_FIL:
>> >> >> +            if (elem_id == 15)
>> >> >> +                elem_id += get_bits(&gb, 8) - 1;
>> >> >> +            while (elem_id > 0)
>> >> >> +                elem_id -= decode_extension_payload(ac, &gb, elem_id);
>> >> >> +            err = 0; /* FIXME */
>> >> >> +            break;
>> >> >> +
>> >> >> +        case TYPE_PCE:
>> >> >> +        {
>> >> >> +            enum ChannelPosition new_che_pos[4][MAX_ELEM_ID];
>> >> >> +            memset(new_che_pos, 0, 4 * MAX_ELEM_ID * sizeof(new_che_pos[0][0]));
>> >> >> +            if((err = decode_pce(ac, new_che_pos, &gb)))
>> >> >> +                break;
>> >> >> +            err = output_configure(ac, ac->che_pos, new_che_pos);
>> >> >> +            break;
>> >> >> +        }
>> >> >> +
>> >> >> +        case TYPE_DSE:
>> >> >> +            skip_data_stream_element(&gb);
>> >> >> +            err = 0;
>> >> >> +            break;
>> >> >> +
>> >> >> +        case TYPE_CCE:
>> >> >> +            if (ac->che[TYPE_CCE][elem_id])
>> >> >> +                err = decode_cce(ac, &gb, elem_id);
>> >> >> +            break;
>> >> >> +
>> >> >> +        case TYPE_LFE:
>> >> >> +            if (ac->che[TYPE_LFE][elem_id])
>> >> >> +                err = decode_ics(ac, &ac->che[TYPE_LFE][elem_id]->ch[0], &gb, 0, 0);
>> >> >> +            break;
>> >> >
>> >> > these checks could be factorized out
>> >> > if(elem_type < C && !ac->che[elem_type][elem_id])
>> >> >    return -1;
>> >>
>> >> The TYPE_SCE case has something slightly different so it will have to be:
>> >>
>> >> if(elem_type && elem_type < TYPE_DSE && !ac->che[elem_type][elem_id])
>> >>     return -1;
>> >
>> > TYPE_SCE can be dealt with before the error check.
>>
>> Do you just mean:
>>
>> if(elem_type == TYPE_SCE && !ac->che[TYPE_SCE][elem_id] &&
>>         elem_id = 1 && ac->che[TYPE_LFE][0]) {
>>     ac->che[TYPE_SCE][elem_id] = ac->che[TYPE_LFE][0];
>>     ac->che[TYPE_LFE][0] = NULL;
>> }
>> if(elem_type < TYPE_DSE && !ac->che[elem_type][elem_id])
>>     return -1;
>
> yes

OK, committed.

Rob