[FFmpeg-devel] [PATCH] AAC Encoder, Round 2

Sun Aug 24 18:45:58 CEST 2008

On Sun, Aug 24, 2008 at 06:44:07PM +0300, Kostya wrote:
> On Sun, Aug 24, 2008 at 04:10:12PM +0200, Michael Niedermayer wrote:
[...]
>  
> > >  
> > > > except that, i think the previous reviews have not been dealt with yet.
> > > > That is the various suggestions for quality improvment should be tried
> > > > what is better should be adopted
> > > > Also everything that Gabriel Bouvign suggested should be tried.
> > > 
> > > Err, when I find a way to download them. $20 for three-page paper is a bit
> > > high to me.
> > 
> > forget the papers, implement what does not depend on pay per view paper
> > IIRC he said something about scalefactors and 3gpp as well.
> 
> He did, but that also influences psy model interface (see below). 

Anyway i suggest that you read some of the RD papers about video coding
(even if you read the audio related ones)

>  
> > >  
> > > > I do not mind if we leave some of the harder things like viterbi based window
> > > > decission to after svn ci, but the majority of the things suggested should
> > > > be tried before the code is commited.
> > > 
> > > Comment on interface then or propose your own.
> > > It will be needed to plug any psychoacoustic model.
> > > Also it would allow to finish encoder faster and then concentrate on
> > > model(s).
> > 
> > The split between psy and encoder is odd to say at least.
> > 
> > things psy can provide IMHO
> > * find perceptual weights per band or per coefficient used for RD
> > * find the perceptual distortion between 2 time domain signals
> > * find the perceptual distortion between 2 freq domain signals, possibly
> >   just a single band or coeff
>  
> Since Gabriel recommended exactly that model, I've tried to implement it in least
> intrusive way. As you demand highest possible quality, let's discuss how it should
> be done.
> 
> My proposition (everybody uses slightly different terms, so I may get something wrong):

> 0. Initialize everything

of course ...

> 1. Perform some input filtering (lowpass, highpass, stereo attenuation, whatever)

Its debateable in how far this should be here or seperate and outside of the
encoder.

> 2. Model decides window type (well, in distant future it can be 'undecided' and encoder
> will try both)

> 3. Encoder performs windowing and MDCT (and grouping?)

i dont think grouping can be done at this point, at least not optimally.

> 4. Model calculates perceptual entropy and thresholds
> 5. Ratecontrol module in encoder uses them to produce final thresholds
> 5.1 maybe it will call psy model to calculate perceptual distortion for the band
> 6. Encoder quantizes input with scalefactors
> 7. Encoder determines and encodes band info and coefficients
> 8. Fetch next frame and goto step 1 unless it was the last frame
> 
> Any ideas/suggestions/patches?

Iam not sure, this is quite vague

A few points that are IMO important
* decissions must NOT be bundled into psy models, that is when we implement
  3 differnt heuristics to choose the MDCT/window size they must be choosable
  independant of the remaining unrelated psy model, this also applies to
  things like stereo attenution coeffs, the way low/highpass cutoff is
  choosen and so on ...
* The primary goal is highest quality encoding, anything that would make
  achiving this goal harder will be rejected.
* coeff quantization and scalefactors must be decided based on RD.
  Its perfectly fine to support faster alternatives in addition ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Observe your enemies, for they first find out your faults. -- Antisthenes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080824/af7245cd/attachment.pgp>