[FFmpeg-devel] [PATCH] AAC Encoder, Round 2

Mon Aug 25 12:09:54 CEST 2008

On Sun, Aug 24, 2008 at 09:27:44PM +0200, Michael Niedermayer wrote:
> On Sun, Aug 24, 2008 at 09:05:54PM +0300, Kostya wrote:
> > On Sun, Aug 24, 2008 at 06:45:58PM +0200, Michael Niedermayer wrote:
[...]
> > > > 3. Encoder performs windowing and MDCT (and grouping?)
> > > 
> > > i dont think grouping can be done at this point, at least not optimally.
> > 
> > well, from my POV, you can just merge groups with similar scalefactors after
> > they are known
> 
> well you dont know the scalefactors yet ...
> besides what is "similar"

I've seen two empty consequent window groups in a frame sometimes that could be
merged, otherwise I can't say how to perform grouping.

> > > > 4. Model calculates perceptual entropy and thresholds
> > > > 5. Ratecontrol module in encoder uses them to produce final thresholds
> > > > 5.1 maybe it will call psy model to calculate perceptual distortion for the band
> > > > 6. Encoder quantizes input with scalefactors
> > > > 7. Encoder determines and encodes band info and coefficients
> > > > 8. Fetch next frame and goto step 1 unless it was the last frame
> > > > 
> > > > Any ideas/suggestions/patches?
> > > 
> > > Iam not sure, this is quite vague
> > > 
> > > 
> > > A few points that are IMO important
> > > * decissions must NOT be bundled into psy models, that is when we implement
> > >   3 differnt heuristics to choose the MDCT/window size they must be choosable
> > >   independant of the remaining unrelated psy model, this also applies to
> > >   things like stereo attenution coeffs, the way low/highpass cutoff is
> > >   choosen and so on ...
> > 
> > then how? select separate module for each psy step?
> 
> not sure i would call it "module" but yes in princple
> 
> i was more thinking of 
> if(avctx->something == something){
> }else{
> }
> though, the struct, function point, ... system seem a little overkill here

So, should I reduce psy model to filling up Psy3gppBand data and move
rate control and quantization to encoder?

> > 
> > > * The primary goal is highest quality encoding, anything that would make
> > >   achiving this goal harder will be rejected.
> > 
> > Well, I can implement it in [...] time :)
> 
> great ;)))

with the plan, of course 

> > 
> > > * coeff quantization and scalefactors must be decided based on RD.
> > >   Its perfectly fine to support faster alternatives in addition ...
> >  
> > I think that should be done in encoder.
> 
> yes
> IMHO the psy model should just tell the encoder how important each band is
> in terms of audibility of distortion that is should provide perceptual weights.
> That way the psy model does not need to mess with anything aac specific ...
> and the encoder can do all the RD, bit counting quantization, ...
> Sadly this is not exactly how the simlpe 3gpp model is designed ...

that's tricky to formulate

> > As I previously mentioned, I like to keep encoder and psy model separated
> > and I like to have them working ASAP.
> > 
> > As I have working AAC encoder, I'd like to make it fit for making optimal
> > and perfect it piece by piece then. Rewriting it from scratch will require
> > clear requirements too. So let's settle on some workflow scheme.
> 
> i didnt ask for a rewrite ...
> 
> 
> [...]
> 
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> Those who are too smart to engage in politics are punished by being
> governed by those who are dumber. -- Plato