[FFmpeg-devel] [RFC] AAC Encoder

Thu Aug 14 15:38:17 CEST 2008

On Thu, Aug 14, 2008 at 08:48:42AM +0300, Kostya wrote:
> On Wed, Aug 13, 2008 at 04:44:18PM +0200, Michael Niedermayer wrote:
> > On Wed, Aug 13, 2008 at 04:42:56PM +0300, Kostya wrote:
> > > On Wed, Aug 13, 2008 at 02:57:50PM +0200, Michael Niedermayer wrote:
> > [...]
> > > 
> > > > > 3. based on psy model suggestions, encoder performs windowing and MDCT
> > > > 
> > > > ok
> > > > 
> > > > 
> > > > > 4. encoder feeds coefficients to psy model
> > > > > 5. psy model by some magic determines scalefactors and use them to convert
> > > > > coefficients into integer form
> > > > > 6. encoder encodes obtained scalefactors and integer coefficients
> > > > > 
> > > > > There are 11 codebooks for AAC, each designed to code either pairs or quads
> > > > > of values with sign coded separately or incorporated into value,
> > > > > each has a maximum value limit.
> > > > > While it's feasible to find the best encoding (like take raw coeff, quantize
> > > > > it and round up or down, then see which vector takes less bits), I feel
> > > > > it would be too slow.
> > > > 
> > > > thats fine, you already have the fast variant implemented i do not suggest
> > > > that to be removed, what we need is a high quality variant. The encoder should
> > > > be better than other encoders ...
> > > > Also as the max value you mentioned is another example of where your code
> > > > fails fatally, a single +3 that would sound nearly as good when encoded as +2
> > > > could force a less efficient code book to be choosen. Also the +3 could be
> > > > encoded as a pulse, i dont remember if your code optimally choose between
> > > > pulse and normal codebook encodings?
> > > 
> > > not optimally, unfortunately, but it can search for pulses and encode them
> > > 
> > > in any case, here's a new encoder version
> > 
> > please commit the parts ive ok-ed and/or send a patch without them
> 
> done (there were okayed parts only in aacenc.c)

thanks you very much, this really make reviewing less tireing :)

[...]
> @@ -268,6 +317,327 @@
>  }
>  
>  /**
> + * Encode MS data.
> + * @see 4.6.8.1 "Joint Coding - M/S Stereo"
> + */
> +static void encode_ms_info(PutBitContext *pb, ChannelElement *cpe)
> +{
> +    int i, w;
> +
> +    put_bits(pb, 2, cpe->ms.present);
> +    if(cpe->ms.present == 1)
> +        for(w = 0; w < cpe->ch[0].ics.num_windows; w++){
> +            if(cpe->ch[0].ics.group_len[w]) continue;
> +            for(i = 0; i < cpe->ch[0].ics.max_sfb; i++)
> +                put_bits(pb, 1, cpe->ms.mask[w][i]);
> +        }
> +}

i think things could be stored more efficiently than this kind of
if group_len[w] != 0 then [w] unused
IIRC the decoder stores things more compactly

> +
> +/**
> + * Scan scalefactor band and determine optimal codebook for it.
> + *
> + * @param s       encoder context
> + * @param cpe     channel element
> + * @param channel channel number inside channel pair
> + * @param win     window group start number
> + * @param band    scalefactor band to analyze
> + * @param start   scalefactor band position in spectral coefficients
> + * @param size    scalefactor band size
> + */
> +static int determine_section_info(AACEncContext *s, ChannelElement *cpe, int channel, int win, int band, int start, int size)
> +{
> +    int i, j, w;
> +    int maxval, sign;
> +    int score, best, cb, bestcb, dim, idx, start2;
> +
> +    maxval = 0;
> +    sign = 0;
> +    w = win;
> +    start2 = start;
> +    do{
> +        for(i = start2; i < start2 + size; i++){
> +            maxval = FFMAX(maxval, FFABS(cpe->ch[channel].icoefs[i]));
> +            if(cpe->ch[channel].icoefs[i] < 0) sign = 1;
> +        }
> +        w++;
> +        start2 += 128;
> +    }while(w < cpe->ch[channel].ics.num_windows && cpe->ch[channel].ics.group_len[w]);
> +
> +    if(maxval > 12) return 11;
> +    if(!maxval) return 0;
> +
> +    for(cb = 0; cb < 12; cb++)
> +        if(aac_cb_info[cb].maxval >= maxval)
> +            break;
> +    best = INT_MAX;
> +    bestcb = 11;
> +    for(; cb < 12; cb++){
> +        score = 0;
> +        dim = (aac_cb_info[cb].flags & CB_PAIRS) ? 2 : 4;
> +        if(!band || cpe->ch[channel].band_type[win][band - 1] != cb)
> +            score += 9; //that's for new codebook entry
> +        w = win;
> +        start2 = start;
> +        if(aac_cb_info[cb].flags & CB_UNSIGNED){
> +            do{
> +                for(i = start2; i < start2 + size; i += dim){
> +                    idx = 0;
> +                    for(j = 0; j < dim; j++)
> +                        idx = idx * aac_cb_info[cb].maxval + FFABS(cpe->ch[channel].icoefs[i+j]);
> +                    score += ff_aac_spectral_bits[aac_cb_info[cb].cb_num][idx];
> +                    for(j = 0; j < dim; j++)
> +                        if(cpe->ch[channel].icoefs[i+j])
> +                            score++;
> +                }
> +                w++;
> +                start2 += 128;
> +            }while(w < cpe->ch[channel].ics.num_windows && cpe->ch[channel].ics.group_len[w]);
> +        }else{
> +            do{
> +                for(i = start2; i < start2 + size; i += dim){
> +                    idx = 0;
> +                    for(j = 0; j < dim; j++)
> +                        idx = idx * (aac_cb_info[cb].maxval*2 + 1) + cpe->ch[channel].icoefs[i+j] + aac_cb_info[cb].maxval;
> +                    score += ff_aac_spectral_bits[aac_cb_info[cb].cb_num][idx];
> +                }
> +                w++;
> +                start2 += 128;
> +            }while(w < cpe->ch[channel].ics.num_windows && cpe->ch[channel].ics.group_len[w]);
> +        }
> +        if(score < best){
> +            best = score;
> +            bestcb = cb;
> +        }
> +    }
> +    return bestcb;
> +}

viterbi for determining band_types ...
look this isnt hard, its not even slow in this paricular case,let me explain

assuming you encode every band with several band types
(you do that above already), each gets some score (bits or RD or whatever)
from that you end up with a score table like

        band-1  band-2  band-3
CB-0       123    456    345
CB-1       222    333    349
CB-2       321    453    567

now you just must find the path of the best (lowest) score
runs of the same band up to 6 cost 7 bits and up to 13 cost 10, ...
in the first iteration you determine the best way to store a run ending at
band-1
band-1: run=1,CB=0,score=130

next you determine it for band-2
the cases to consider are all runs * all codebooks (its likely possible to
skip some with some thinking of what cannot be optimal)
for run=1 we know CB=1 is best and it would be followed by the previously
determined stuff on band-1 with score of (130+333+7)
for run-2 there are 123+456+7,222+333+7,321+453+7
its clear
band-1: run=1,CB=0,score=130
band-2: run=1,CB=1,score=470
is the next result
next there is band 3, with run=1 CB=0 would be best with score of
345+7+470=822
with run=2 130+456+345+7, 130+333+349+7, 130+453+567+7 would be the options
here we already found a better one that is 130+333+349+7=819
with run=3 123+456+345+7, 222+333+349+7, 321+453+567+7
i think none of them is better
so we end up with
band-1: run=1,CB=0,score=130
band-2: run=1,CB=1,score=470
band-3: run=2,CB=1,score=819

and so forth ...
its rather trivial, just a flat array or 2, no complicated data structs or
anything needed (i wouldnt ask you to implement this if it wasnt so trivial)
though yes it likely wont make a big difference in terms of compression but
maybe viterbi can be used at different places too where it would make a
bigger difference.

> +
> +/**
> + * Encode one scalefactor band with selected codebook.
> + */
> +static void encode_band_coeffs(AACEncContext *s, ChannelElement *cpe, int channel, int start, int size, int cb)
> +{
> +    const uint8_t  *bits  = ff_aac_spectral_bits [aac_cb_info[cb].cb_num];
> +    const uint16_t *codes = ff_aac_spectral_codes[aac_cb_info[cb].cb_num];
> +    const int dim = (aac_cb_info[cb].flags & CB_PAIRS) ? 2 : 4;
> +    int i, j, idx;
> +
> +    if(!bits) return;
> +
> +    if(aac_cb_info[cb].flags & CB_ESCAPE){
> +        for(i = start; i < start + size; i += dim){
> +            idx = 0;
> +            for(j = 0; j < dim; j++)
> +                idx = idx*17 + FFMIN(FFABS(cpe->ch[channel].icoefs[i+j]), 16);
> +            put_bits(&s->pb, bits[idx], codes[idx]);
> +            //output signs
> +            for(j = 0; j < dim; j++)
> +                if(cpe->ch[channel].icoefs[i+j])
> +                    put_bits(&s->pb, 1, cpe->ch[channel].icoefs[i+j] < 0);
> +            //output escape values
> +            for(j = 0; j < dim; j++)
> +                if(FFABS(cpe->ch[channel].icoefs[i+j]) > 15){

> +                    int l = av_log2(FFABS(cpe->ch[channel].icoefs[i+j]));

please use a different variable l is hard to distinguish from 1 in some fonts.

> +
> +                    put_bits(&s->pb, l - 4 + 1, (1 << (l - 4 + 1)) - 2);
> +                    put_bits(&s->pb, l, FFABS(cpe->ch[channel].icoefs[i+j]) & ((1 << l) - 1));
> +                }
> +        }

FFABS is done 3 times, this isnt efficient.

> +    }else if(aac_cb_info[cb].flags & CB_UNSIGNED){
> +        for(i = start; i < start + size; i += dim){
> +            idx = 0;
> +            for(j = 0; j < dim; j++)
> +                idx = idx * (aac_cb_info[cb].maxval + 1) + FFABS(cpe->ch[channel].icoefs[i+j]);
> +            put_bits(&s->pb, bits[idx], codes[idx]);
> +            //output signs
> +            for(j = 0; j < dim; j++)
> +                if(cpe->ch[channel].icoefs[i+j])
> +                    put_bits(&s->pb, 1, cpe->ch[channel].icoefs[i+j] < 0);
> +        }
> +    }else{
> +        for(i = start; i < start + size; i += dim){
> +            idx = 0;
> +            for(j = 0; j < dim; j++)
> +                idx = idx * (aac_cb_info[cb].maxval*2 + 1) + cpe->ch[channel].icoefs[i+j] + aac_cb_info[cb].maxval;
> +            put_bits(&s->pb, bits[idx], codes[idx]);
> +        }
> +    }

(aac_cb_info[cb].maxval*2 + 1) should be calculated outside the loop.
and the addition of maxval is a matter of adjusting teh bits/codes pointers

> +}
> +
> +/**
> + * Encode scalefactor band coding type.
> + */
> +static void encode_band_info(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel)
> +{
> +    int i, w;
> +    int bits = cpe->ch[channel].ics.num_windows == 1 ? 5 : 3;
> +    int esc = (1 << bits) - 1;
> +    int count;
> +
> +    for(w = 0; w < cpe->ch[channel].ics.num_windows; w++){
> +        if(cpe->ch[channel].ics.group_len[w]) continue;
> +        count = 0;
> +        for(i = 0; i < cpe->ch[channel].ics.max_sfb; i++){
> +            if(!i || cpe->ch[channel].band_type[w][i] != cpe->ch[channel].band_type[w][i-1]){
> +                if(count){
> +                    while(count >= esc){
> +                        put_bits(&s->pb, bits, esc);
> +                        count -= esc;
> +                    }
> +                    put_bits(&s->pb, bits, count);
> +                }
> +                put_bits(&s->pb, 4, cpe->ch[channel].band_type[w][i]);
> +                count = 1;
> +            }else
> +                count++;
> +        }
> +        if(count){
> +            while(count >= esc){
> +                put_bits(&s->pb, bits, esc);
> +                count -= esc;
> +            }
> +            put_bits(&s->pb, bits, count);
> +        }
> +    }
> +}

as seen above, the run values would be known from viterbi so this can be done
simpler

[...]
> +/**
> + * Encode pulse data.
> + */
> +static void encode_pulses(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel)
> +{
> +    int i;
> +
> +    put_bits(&s->pb, 1, !!cpe->ch[channel].pulse.num_pulse);
> +    if(!cpe->ch[channel].pulse.num_pulse) return;
> +
> +    put_bits(&s->pb, 2, cpe->ch[channel].pulse.num_pulse - 1);
> +    put_bits(&s->pb, 6, cpe->ch[channel].pulse.start);
> +    for(i = 0; i < cpe->ch[channel].pulse.num_pulse; i++){
> +        put_bits(&s->pb, 5, cpe->ch[channel].pulse.offset[i]);
> +        put_bits(&s->pb, 4, cpe->ch[channel].pulse.amp[i]);
> +    }
> +}

what data struct was cpe->ch[channel].pulse ?
it seems more logic to pass it instead of cpe

[...]
> +
> +/**
> + * Encode spectral coefficients processed by psychoacoustic model.
> + */
> +static void encode_spectral_coeffs(AVCodecContext *avctx, AACEncContext *s, ChannelElement *cpe, int channel)
> +{
> +    int start, i, w, w2;
> +
> +    for(w = 0; w < cpe->ch[channel].ics.num_windows; w++){

> +        if(cpe->ch[channel].ics.group_len[w]) continue;

:/
yes these being why iam not approving several functions, i think its
avoidable or am i missing somehing?

[...]
> +/**
> + * Encode one channel of audio data.
> + */
> +static int encode_individual_channel(AVCodecContext *avctx, ChannelElement *cpe, int channel)
> +{
> +    AACEncContext *s = avctx->priv_data;
> +    int i, g, w;
> +
> +    for(w = 0; w < cpe->ch[channel].ics.num_windows; w++){
> +        i = w << 7;
> +        if(cpe->ch[channel].ics.group_len[w]) continue;
> +        for(g = 0; g < cpe->ch[channel].ics.max_sfb; g++){
> +            if(!cpe->ch[channel].zeroes[w][g]){
> +                cpe->ch[channel].band_type[w][g] = determine_section_info(s, cpe, channel, w, g, i, cpe->ch[channel].ics.swb_sizes[g]);
> +                cpe->ch[channel].zeroes[w][g] = !cpe->ch[channel].band_type[w][g];
> +            }else
> +                cpe->ch[channel].band_type[w][g] = 0;
> +            i += cpe->ch[channel].ics.swb_sizes[g];
> +        }
> +    }
> +

> +    put_bits(&s->pb, 8, cpe->ch[channel].mixing_gain); //global gain

mixing gain or global gain? IIRC mixing_gain has been removed from the
decoder ...

[...]
> +    if(!avctx->frame_number){
> +        memmove(s->samples, s->samples + 1024 * avctx->channels, 1024 * avctx->channels * sizeof(s->samples[0]));
> +        return 0;
> +    }

is this really overlapping so it needs a memmove?

> +
> +    init_put_bits(&s->pb, frame, buf_size*8);
> +    if(avctx->frame_number==1 && !(avctx->flags & CODEC_FLAG_BITEXACT)){
> +        put_bitstream_info(avctx, s, LIBAVCODEC_IDENT);
> +    }

hmm this should be written in extradata or repeated occasionally
just at the begin almost asks for it to be cut off

[...]

> +    put_bits(&s->pb, 3, ID_END);
> +    flush_put_bits(&s->pb);
> +    avctx->frame_bits = put_bits_count(&s->pb);
> +
> +    if(!data)
> +        s->last_frame = 1;

ok

> +    memmove(s->samples, s->samples + 1024 * avctx->channels, 1024 * avctx->channels * sizeof(s->samples[0]));

same comment as above

> +    return put_bits_count(&s->pb)>>3;
> +}

ok

[...]

> /*
>  * AAC encoder psychoacoustic model
>  * Copyright (C) 2008 Konstantin Shishkov
>  *
>  * This file is part of FFmpeg.
>  *
>  * FFmpeg is free software; you can redistribute it and/or
>  * modify it under the terms of the GNU Lesser General Public
>  * License as published by the Free Software Foundation; either
>  * version 2.1 of the License, or (at your option) any later version.
>  *
>  * FFmpeg is distributed in the hope that it will be useful,
>  * but WITHOUT ANY WARRANTY; without even the implied warranty of
>  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>  * Lesser General Public License for more details.
>  *
>  * You should have received a copy of the GNU Lesser General Public
>  * License along with FFmpeg; if not, write to the Free Software
>  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
>  */
> 
> #ifndef FFMPEG_AACPSY_H
> #define FFMPEG_AACPSY_H
> 
> #include "avcodec.h"
> #include "aac.h"
> #include "lowpass.h"
> 
> enum AACPsyModelType{
>     AAC_PSY_NULL,              ///< do nothing with frequencies
>     AAC_PSY_NULL8,             ///< do nothing with frequencies but work with short windows
>     AAC_PSY_3GPP,              ///< model following recommendations from 3GPP TS 26.403
> 
>     AAC_NB_PSY_MODELS          ///< total number of psychoacoustic models, since it's not a part of the ABI new models can be added freely
> };

ok
and ill review the rest of the psy model ASAP

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Republics decline into democracies and democracies degenerate into
despotisms. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080814/b86384a4/attachment.pgp>