[FFmpeg-devel] [RFC] Generic psychoacoustic model interface

Wed Aug 27 13:21:51 CEST 2008

Kostya wrote:
> Here's my first attempt to define codec-agnostic psy model.
> Here's an interface for it. I'm not sure about AC3, but
> it should be possible to use it with DCA, Vorbis,
> MPEG Audio Layers I-III and NBC, maybe WMA too.
> In case somebody codes an implementation, of course.
> Personally I plan to make my encoder use it backed with
> already implemented 3GPP model.

1) The general issue of using _any_ psychoacoustical model with HD (>48 kHz) 
audio. How is the whole spectrum supposed to be split into bands? I.e., with 
192 kHz sampling rate (think DCA with proprietary extensions), are you really 
sure to split the whole 96 kHz spectrum into just 128 equal subbands?

2) In FFPsyContext, the distinction between only _two_ frame types (long and 
short) is hard-coded. For some codecs, this model makes no sense. E.g. for 
DCA, a subframe always contains 4 subsubframes each corresponding to 256 PCM 
samples (but the synthesis FIR is 512 taps long). One can either say "this is 
a common scale factor for all 4 subsubframes", or define a transient location 
at one subsubframe boundary and say "here is the scale factor before the 
transient, and here it is after the transient". This doesn't really map into 
the above model.

Moreover, who (codec or psy model) is responsible for transient detection (and 
for non-DCA codecs, choice of short vs long blocks)?

3) The whole "scalefactor band lengths for long frame" business assumes 
non-overlapping (or almost non-overlapping) bands. This is simply not the 
case for DCA. For DCA, each subband (i.e., the entity for which one can 
specify a scale factor [ignoring transients here]) except the first and the 
last, has a bell-shaped form, and subbands overlap in half. I.e. something 
like this ASCII art attempts to depict:

.
    .
        .
         .
,        .
    ,  .
    .  ,
.       ,
_       ,
    _  ,
    ,  _
,       _
        _
      _
    _
_

-- 
Alexander E. Patrakov