[FFmpeg-devel] [RFC] Generic psychoacoustic model interface
Alexander E. Patrakov
patrakov
Wed Aug 27 13:21:51 CEST 2008
Kostya wrote:
> Here's my first attempt to define codec-agnostic psy model.
> Here's an interface for it. I'm not sure about AC3, but
> it should be possible to use it with DCA, Vorbis,
> MPEG Audio Layers I-III and NBC, maybe WMA too.
> In case somebody codes an implementation, of course.
> Personally I plan to make my encoder use it backed with
> already implemented 3GPP model.
1) The general issue of using _any_ psychoacoustical model with HD (>48 kHz)
audio. How is the whole spectrum supposed to be split into bands? I.e., with
192 kHz sampling rate (think DCA with proprietary extensions), are you really
sure to split the whole 96 kHz spectrum into just 128 equal subbands?
2) In FFPsyContext, the distinction between only _two_ frame types (long and
short) is hard-coded. For some codecs, this model makes no sense. E.g. for
DCA, a subframe always contains 4 subsubframes each corresponding to 256 PCM
samples (but the synthesis FIR is 512 taps long). One can either say "this is
a common scale factor for all 4 subsubframes", or define a transient location
at one subsubframe boundary and say "here is the scale factor before the
transient, and here it is after the transient". This doesn't really map into
the above model.
Moreover, who (codec or psy model) is responsible for transient detection (and
for non-DCA codecs, choice of short vs long blocks)?
3) The whole "scalefactor band lengths for long frame" business assumes
non-overlapping (or almost non-overlapping) bands. This is simply not the
case for DCA. For DCA, each subband (i.e., the entity for which one can
specify a scale factor [ignoring transients here]) except the first and the
last, has a bell-shaped form, and subbands overlap in half. I.e. something
like this ASCII art attempts to depict:
.
.
.
.
, .
, .
. ,
. ,
_ ,
_ ,
, _
, _
_
_
_
_
--
Alexander E. Patrakov
More information about the ffmpeg-devel
mailing list