[FFmpeg-devel] [RFC] AAC Encoder
Kostya
kostya.shishkov
Fri Aug 15 09:11:03 CEST 2008
On Thu, Aug 14, 2008 at 11:42:44PM +0200, Michael Niedermayer wrote:
> >
> > enum AACPsyModelMode{
> > PSY_MODE_CBR, ///< follow bitrate as closely as possible
> > PSY_MODE_ABR, ///< try to achieve bitrate but actual bitrate may differ significantly
> > PSY_MODE_QUALITY, ///< try to achieve set quality instead of bitrate
> > };
> >
> > #define PSY_MODEL_MODE_MASK 0x0000000F ///< bit fields for storing mode (CBR, ABR, VBR)
>
> please use bitrate tolterance/bitrate/max/min bitrate/buffer size/...
> from AVCodecContext for selecting the mode
I will, but I will keep those for internal state.
> > #define PSY_MODEL_NO_PULSE 0x00000010 ///< disable pulse searching
> > #define PSY_MODEL_NO_SWITCH 0x00000020 ///< disable window switching
> > #define PSY_MODEL_NO_ST_ATT 0x00000040 ///< disable stereo attenuation
> > #define PSY_MODEL_NO_LOWPASS 0x00000080 ///< disable low-pass filtering
>
> How does the user pass these to the codec?
> I suspect in AVCodecContext, if so above would be redundant and unneeded
> as AVCodecContext is availabe to the psy model
huh? I haven't seen flags for such thing in avcodec.h
Even if model takes flags from codec context, it needs to know its meaning
> also i think that the choice of how encode a coefficient, that is as a
> pulse or not is not a psychoacoustic question but one of entropy coding.
> "which way needs fewer bits has better RD"
yes, I think it may be merged into determining codebook sequence with Viterbi algorithm
(i.e. weight for codebook coded with pulses)
> >
> > #define PSY_MODEL_NO_PREPROC (PSY_MODEL_NO_ST_ATT | PSY_MODEL_NO_LOWPASS)
> >
> > #define PSY_MODEL_MODE(a) ((a) & PSY_MODEL_MODE_MASK)
> >
> > /**
> > * context used by psychoacoustic model
> > */
> > typedef struct AACPsyContext {
> > AVCodecContext *avctx; ///< encoder context
> >
> > int flags; ///< model flags
>
> > const uint8_t *bands1024; ///< scalefactor band sizes for long (1024 samples) frame
> > int num_bands1024; ///< number of scalefactor bands for long frame
> > const uint8_t *bands128; ///< scalefactor band sizes for short (128 samples) frame
> > int num_bands128; ///< number of scalefactor bands for short frame
>
> This is a little AAC specific but then its called AACPsyContext
> so iam not sure. Is the code supposed to be a generic psychoacoustic model
> or AAC specific?
AAC-specific. I thinks it's possible to make it more generic, but it will require
some radical changes, especially for window switching code and scalefactors.
> [...]
> > /**
> > * Convert coefficients to integers.
> > * @return sum of coefficients
> > * @see 3GPP TS26.403 5.6.2 "Scalefactor determination"
> > */
> > static inline int convert_coeffs(float *in, int *out, int size, int scale_idx)
>
> quantize_coeffs
> and scale_idx should be replaced by a quantization factor.
>
>
> > {
> > int i, sign, sum = 0;
> > for(i = 0; i < size; i++){
> > sign = in[i] > 0.0;
> > out[i] = (int)(pow(FFABS(in[i]) * ff_aac_pow2sf_tab[200 - scale_idx + SCALE_ONE_POS - SCALE_DIV_512], 0.75) + 0.4054);
>
> fabs()
>
>
> > out[i] = av_clip(out[i], 0, 8191);
> > sum += out[i];
> > if(sign) out[i] = -out[i];
> > }
> > return sum;
> > }
>
>
>
> >
> > static inline float unquant(int q, int scale_idx){
> > return (FFABS(q) * cbrt(q*1.0)) * ff_aac_pow2sf_tab[200 + scale_idx - SCALE_ONE_POS];
> > }
>
> also please replace scale_idx by a factor, repeatly doing these lookups is
> likely inefficient, also it is unflexible in relation to non aac
>
>
> > static inline float calc_distortion(float *c, int size, int scale_idx)
> > {
> > int i;
> > int q;
> > float coef, unquant, sum = 0.0f;
> > for(i = 0; i < size; i++){
> > coef = FFABS(c[i]);
> > q = (int)(pow(FFABS(coef) * ff_aac_pow2sf_tab[200 - scale_idx + SCALE_ONE_POS - SCALE_DIV_512], 0.75) + 0.4054);
> > q = av_clip(q, 0, 8191);
> > unquant = (q * cbrt(q)) * ff_aac_pow2sf_tab[200 + scale_idx - SCALE_ONE_POS + SCALE_DIV_512];
> > sum += (coef - unquant) * (coef - unquant);
> > }
> > return sum;
> > }
>
> I think this and previous functions have some common code that can be
> factorized out
>
>
> [...]
> > static void psy_null8_process(AACPsyContext *apc, int tag, int type, ChannelElement *cpe)
> > {
> > int start;
> > int w, ch, g, i;
> > int chans = type == ID_CPE ? 2 : 1;
> >
> > //detect M/S
> > if(chans > 1 && cpe->common_window){
> > start = 0;
> > for(w = 0; w < cpe->ch[0].ics.num_windows; w++){
> > for(g = 0; g < cpe->ch[0].ics.num_swb; g++){
> > float diff = 0.0f;
> >
> > for(i = 0; i < cpe->ch[0].ics.swb_sizes[g]; i++)
> > diff += fabs(cpe->ch[0].coeffs[start+i] - cpe->ch[1].coeffs[start+i]);
> > cpe->ms.mask[w][g] = diff == 0.0;
> > }
> > }
> > }
>
> the mid side bits should also be detected ideally by encoding both ways
> and choosing by rate distortion
>
> above really looks a little lame, one should at least calculate either
> bits or distortion and choose based on that if both are not ...
This is just a sample model to exercise encoder capabilities.
I will include my working model next time.
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> it is not once nor twice but times without number that the same ideas make
> their appearance in the world. -- Aristotle
More information about the ffmpeg-devel
mailing list