[FFmpeg-devel] [PATCH] AAC decoder round 8
Robert Swain
robert.swain
Mon Aug 18 15:03:53 CEST 2008
2008/8/18 Michael Niedermayer <michaelni at gmx.at>:
> On Mon, Aug 18, 2008 at 01:47:42PM +0100, Robert Swain wrote:
>> 2008/8/18 Robert Swain <robert.swain at gmail.com>:
>> > 2008/8/18 Robert Swain <robert.swain at gmail.com>:
>> >> 2008/8/18 Robert Swain <robert.swain at gmail.com>:
>> >>> 2008/8/15 Robert Swain <robert.swain at gmail.com>:
>> >>>> 2008/8/15 Michael Niedermayer <michaelni at gmx.at>:
>> >>>>> On Fri, Aug 15, 2008 at 09:04:24AM +0100, Robert Swain wrote:
>> >>>>>> 2008/8/15 Michael Niedermayer <michaelni at gmx.at>:
>> >>>>>> > On Fri, Aug 15, 2008 at 01:32:08AM +0100, Robert Swain wrote:
>> >>>>>> >> $subj
>> >>>>>> >>
>> >>>>>> >> There's not much left to commit now! :D
>> >>>>>> >
>> >>>>>> > ok
>> >>>>>>
>> >>>>>> All committed. Just to make it easier for me and/or you to keep track
>> >>>>>> of, here's another patch attached with the remaining hunks.
>> >>>>>>
>> >>>>>> Regards,
>> >>>>>> Rob
>> >>>>>
>> >>>>>> Index: libavcodec/aac.c
>> >>>>>> ===================================================================
>> >>>>>> --- libavcodec/aac.c (revision 14774)
>> >>>>>> +++ libavcodec/aac.c (working copy)
>> >>>>> [...]
>> >>>>>> @@ -605,6 +616,44 @@
>> >>>>>> }
>> >>>>>>
>> >>>>>> /**
>> >>>>>> + * Decode Temporal Noise Shaping data; reference: table 4.48.
>> >>>>>> + *
>> >>>>>> + * @return Returns error status. 0 - OK, !0 - error
>> >>>>>> + */
>> >>>>>> +static int decode_tns(AACContext * ac, TemporalNoiseShaping * tns,
>> >>>>>> + GetBitContext * gb, const IndividualChannelStream * ics) {
>> >>>>>
>> >>>>>> + int w, filt, i, coef_len, coef_res = 0, coef_compress;
>> >>>>>
>> >>>>> useless init ?
>> >>>>
>> >>>> Subtle, but yes.
>> >>>>
>> >>>>>> + const int is8 = ics->window_sequence[0] == EIGHT_SHORT_SEQUENCE;
>> >>>>>> + const int tns_max_order = is8 ? 7 : ac->m4ac.object_type == AOT_AAC_MAIN ? 20 : 12;
>> >>>>>> + for (w = 0; w < ics->num_windows; w++) {
>> >>>>>> + tns->n_filt[w] = get_bits(gb, 2 - is8);
>> >>>>>> +
>> >>>>>> + if (tns->n_filt[w])
>> >>>>>> + coef_res = get_bits1(gb) + 3;
>> >>>>>> +
>> >>>>>> + for (filt = 0; filt < tns->n_filt[w]; filt++) {
>> >>>>>> + tns->length[w][filt] = get_bits(gb, 6 - 2*is8);
>> >>>>>> +
>> >>>>>
>> >>>>>> + if ((tns->order[w][filt] = get_bits(gb, 5 - 2*is8)) <= tns_max_order) {
>> >>>>>> + tns->direction[w][filt] = get_bits1(gb);
>> >>>>>> + coef_compress = get_bits1(gb);
>> >>>>>> + coef_len = coef_res - coef_compress;
>> >>>>>> + tns->tmp2_map[w][filt] = tns_tmp2_map[2*coef_compress + coef_res - 3];
>> >>>>>
>> >>>>> the 3 can be moved to "coef_len = coef_res - coef_compress + 3"
>> >>>>
>> >>>> But, unless I'm missing something, that will change the behaviour of
>> >>>> the code as more bits will be read in the loop you quoted just below.
>> >>>>
>> >>>>>> +
>> >>>>>> + for (i = 0; i < tns->order[w][filt]; i++)
>> >>>>>> + tns->coef[w][filt][i] = get_bits(gb, coef_len);
>> >>>>>
>> >>>>> tns->coef is only used to index into tmp2_map thus
>> >>>>> tns->coef could already contain the values from tmp2_map
>> >>>>> this also would make the tmp2_map field unneeded in the struct
>> >>>>
>> >>>> OK, I'll look into this.
>> >>>>
>> >>>>>> + } else {
>> >>>>>> + av_log(ac->avccontext, AV_LOG_ERROR, "TNS filter order %d is greater than maximum %d.",
>> >>>>>> + tns->order[w][filt], tns_max_order);
>> >>>>>> + tns->order[w][filt] = 0;
>> >>>>>> + return -1;
>> >>>>>> + }
>> >>>>>
>> >>>>> if(... > tns_max_order){
>> >>>>> ...
>> >>>>> return -1
>> >>>>> }
>> >>>>> ...
>> >>>>>
>> >>>>> seems cleaner to me
>> >>>>
>> >>>> Done.
>> >>>>
>> >>>>>> + }
>> >>>>>> + }
>> >>>>>> + return 0;
>> >>>>>> +}
>> >>>>>> +
>> >>>>>> +/**
>> >>>>>> * Decode Mid/Side data; reference: table 4.54.
>> >>>>>> *
>> >>>>>> * @param ms_present Indicates mid/side stereo presence. [0] mask is all 0s;
>> >>>>>
>> >>>>>> @@ -1067,6 +1116,71 @@
>> >>>>>> }
>> >>>>>>
>> >>>>>> /**
>> >>>>>> + * Decode Temporal Noise Shaping filter coefficients and apply all-pole filters; reference: 4.6.9.3.
>> >>>>>> + *
>> >>>>>> + * @param decode 1 if tool is used normally, 0 if tool is used in LTP.
>> >>>>>> + * @param coef spectral coefficients
>> >>>>>> + */
>> >>>>>> +static void apply_tns(float coef[1024], TemporalNoiseShaping * tns, IndividualChannelStream * ics, int decode) {
>> >>>>>> + const int mmm = FFMIN(ics->tns_max_bands, ics->max_sfb);
>> >>>>>> + int w, filt, m, i, ib;
>> >>>>>> + int bottom, top, order, start, end, size, inc;
>> >>>>>> + float tmp;
>> >>>>>> + float lpc[TNS_MAX_ORDER + 1], b[2 * TNS_MAX_ORDER];
>> >>>>>> +
>> >>>>>> + for (w = 0; w < ics->num_windows; w++) {
>> >>>>>> + bottom = ics->num_swb;
>> >>>>>> + for (filt = 0; filt < tns->n_filt[w]; filt++) {
>> >>>>>> + top = bottom;
>> >>>>>> + bottom = FFMAX( 0, top - tns->length[w][filt]);
>> >>>>>
>> >>>>>> + order = FFMIN(tns->order[w][filt], TNS_MAX_ORDER);
>> >>>>>
>> >>>>> useless?
>> >>>>
>> >>>> Indeed. Removed. Should the 'order' variable be removed as well or is
>> >>>> accessing it faster than accessing order[][]?
>> >>>>
>> >>>>>> + if (order == 0)
>> >>>>>> + continue;
>> >>>>>> +
>> >>>>>
>> >>>>>> + // tns_decode_coef
>> >>>>>> + lpc[0] = 1;
>> >>>>>> + for (m = 1; m <= order; m++) {
>> >>>>>> + lpc[m] = tns->tmp2_map[w][filt][tns->coef[w][filt][m - 1]];
>> >>>>>> + for (i = 1; i < m; i++)
>> >>>>>> + b[i] = lpc[i] + lpc[m] * lpc[m-i];
>> >>>>>> + for (i = 1; i < m; i++)
>> >>>>>> + lpc[i] = b[i];
>> >>>>>> + }
>> >>>>>
>> >>>>> looks a little like eval_coefs from ra144.c
>> >>>>> but later is fixedpoint, so this is more a random comment than anything
>> >>>>>
>> >>>>>
>> >>>>>> +
>> >>>>>> + start = ics->swb_offset[FFMIN(bottom, mmm)];
>> >>>>>> + end = ics->swb_offset[FFMIN( top, mmm)];
>> >>>>>> + if ((size = end - start) <= 0)
>> >>>>>> + continue;
>> >>>>>> + if (tns->direction[w][filt]) {
>> >>>>>> + inc = -1; start = end - 1;
>> >>>>>> + } else {
>> >>>>>> + inc = 1;
>> >>>>>> + }
>> >>>>>> + start += w * 128;
>> >>>>>> +
>> >>>>>> + // ar filter
>> >>>>>> + memset(b, 0, sizeof(b));
>> >>>>>> + ib = 0;
>> >>>>>
>> >>>>>> + for (m = 0; m < size; m++) {
>> >>>>>> + tmp = coef[start];
>> >>>>>> + if (decode) {
>> >>>>>> + for (i = 0; i < order; i++)
>> >>>>>> + tmp -= b[ib + i] * lpc[i + 1];
>> >>>>>> + } else { // encode
>> >>>>>> + for (i = 0; i < order; i++)
>> >>>>>> + tmp += b[i] * lpc[i + 1];
>> >>>>>> + }
>> >>>>>> + if (--ib < 0)
>> >>>>>> + ib = order - 1;
>> >>>>>> + b[ib] = b[ib + order] = tmp;
>> >>>>>> + coef[start] = tmp;
>> >>>>>> + start += inc;
>> >>>>>> + }
>> >>>>>
>> >>>>> decode is always 1
>> >>>>
>> >>>> This is left from LTP. I'll remove it when submitting next.
>> >>>>
>> >>>>> b is not truly needed, coef[] can be used i its place
>> >>>>> also this is likely relevant to overal codec speed so it
>> >>>>> should be written more with speed than compactness in mind
>> >>>>
>> >>>> Does that mean you want me to rewrite it?
>> >>>
>> >>> Without the encode case it would look like this, I think:
>> >>>
>> >>> memset(b, 0, sizeof(b));
>> >>> ib = 0;
>> >>> for (m = 0; m < size; m++) {
>> >>> tmp = coef[start];
>> >>> for (i = 0; i < order; i++)
>> >>> tmp -= b[i] * lpc[i + 1];
>> >>> if (--ib < 0)
>> >>> ib = order - 1;
>> >>> coef[start] = b[ib] = tmp;
>> >>> start += inc;
>> >>> }
>> >>>
>> >>> Then I don't think tmp is needed or b as none of the values of coeff
>> >>> but the one currently being filtered are changed and that one is only
>> >>> subtracted from, it's not involved in any calculations, so I think the
>> >>> following is correct:
>> >>>
>> >>> for (m = 0; m < size; m++) {
>> >>> for (i = 0; i < FFMIN(m, order); i++)
>> >>> coef[start] -= coef[start - 1 - i] * lpc[i + 1];
>> >>> start += inc;
>> >>> }
>> >>
>> >> Make that:
>> >>
>> >> for (m = 0; m < size; m++) {
>> >> for (i = 1; i <= FFMIN(m, order); i++)
>> >> coef[start] -= coef[start - i] * lpc[i];
>> >> start += inc;
>> >> }
>> >
>> > Or, if you like, the start incrementation can be merged into the outer
>> > for() loop's incrementation stuff. What is that bit technically
>> > called? The m++ section.
>>
>> See attached. Tested with the zodiac clip which is known to use TNS
>> and produces identical PSNR, so I'm assuming identical output with no
>> known regressions.
> [...]
>> + for (m = 0; m < size; m++, start += inc)
>> + for (i = 1; i <= FFMIN(m, order); i++)
>> + coef[start] -= coef[start - i] * lpc[i];
>> }
>
> are you sure this is correct for both values of inc ?
> i mean one will have coef[start - i] unfiltered and one filtered as far as
> i can see ...
You're right and I thought this was strange, but it's the same
behaviour as before from what I see and so in that sense it is as
correct as it was before. The spec says:
tns_ar_filter( spectrum[], size, inc, lpc[], order )
{
- Simple all-pole filter of order "order" defined by
y(n) = x(n) - lpc[1]*y(n-1) - ... - lpc[order]*y(n-order)
- The state variables of the filter are initialized to zero every time
- The output data is written over the input data ("in-place operation")
- An input vector of "size" samples is processed and the index increment
to the next data sample is given by "inc"
}
I think the third point suggests that it's the intended approach.
Rob
More information about the ffmpeg-devel
mailing list