[FFmpeg-devel] [RFC] Generic psychoacoustic model interface

Sat Aug 30 18:28:29 CEST 2008

On Sat, Aug 30, 2008 at 04:51:10PM +0200, Michael Niedermayer wrote:
> On Sat, Aug 30, 2008 at 01:21:54PM +0300, Kostya wrote:
> > On Thu, Aug 28, 2008 at 10:36:57PM +0200, Michael Niedermayer wrote:
> > > On Thu, Aug 28, 2008 at 08:10:26PM +0300, Kostya wrote:
> > [...]
> > > > /**
> > > >  * windowing related information
> > > >  */
> > > > typedef struct FFWindowInfo{
> > > 
> > > >     int window_type[2];               ///< window type (short/long/transitional, etc.) - current and previous
> > > 
> > > How is this "transitional" going to work with many different frame lengths?
> > > is there 1? N*N ?
> >  
> > that's for AAC (i.e. requires a bit of different windowing),
> > encoder will set that to internal value
> 
> I think the psy model should not bother with what a specific format may or
> may not do or need.
> There are short blocks, and there are long blocks in AAC, furthermore AAC
> is restricted to have short blocks in consecutive multiplies of 8. Other
> codecs do not have such restrictions.
> Also if AAC needs to specially mark long blocks before and after short
> ones that is the problem of the AAC encoder, not the psy model.
> The window shape of a block surely depends on the next and previous block,
> that is not AAC specific.

would it better to store elsewhere or just introduce next window type? 
I think with introducing next window type it would be obvious what
transition type we have.

> > 
> > [...] 
> > > > /**
> > > >  * Get psychoacoustic model suggestion about coding two bands as M/S
> > > >  */
> > > > enum FFPsyMSDecision ff_psy_suggest_ms(FFPsyContext *ctx, FFPsyBand *left, FFPsyBand *right);
> > > 
> > > iam a little unsure about this one, but iam not objecting ...
> >  
> > dropped for now, may revive later
> > 
> > Here's another draft - it's psychoacoustic model interface with
> > partial implementation (there are some inaccuracies and debugs there,
> > but's this is RFC, not a final patch).
> > 
> > I plan to use it this way with my encoder.
> > 
> > General flow:
> > 
> 
> > init
> > while(frame){
> >   suggest window()
> >   [encoder may ignore that]
> >   set band info() = calculate thresholds for all bands with provided window type
> 
> so far i have no objections
> 
> 
> >   psy analyze() = get distortions and weight for band quantized with a series of
> >                   quantizers, my encoder will use that for RD-aware quantization
> 
> the distortion is only known after the RD "aware" quantization, the weight
> is needed before RD "aware" quantization, so iam somewhat confused by what
> you suggest

from the paper I've read ("Cascaded Trellis-Based Rate-Distortion Control 
Algorithm for MPEG-4 Advanced Audio Coding" aka 01621212.pdf),
it is suggested to calculate optimum quantizer from costs
C = quant_distortion / threshold + lambda * bits

so model tries to calculate those for further Viterbi search

> > }
> > 
> 
> [...]
> > /**
> >  * single band psychoacoustic information
> >  */
> > typedef struct FFPsyBand{
> >     int   bits;
> >     float energy;
> >     float threshold;
> >     float distortion;
> >     float perceptual_weight;
> > }FFPsyBand;
> 
> It should be possible to provide perceptual_weight per coefficient instead
> of per band in the future.

it's easy by adding new function to do that 

> [...]
> > #ifdef ENABLE_AAC_ENCODER
> > #include "aac.h"
> > #include "aactab.h"
> > 
> > /**
> >  * Quantize one coefficient.
> >  * @return absolute value of the quantized coefficient
> >  * @see 3GPP TS26.403 5.6.2 "Scalefactor determination"
> >  */
> > static av_always_inline int quant(float coef, const float Q)
> > {
> >     return av_clip((int)(pow(fabsf(coef) * Q, 0.75) + 0.4054), 0, 8191);
> > }
> > 
> > static inline float psy_aac_get_approximate_quant_error(const float *c, int size,
> >                                                         const float Q, const float IQ)
> > {
> 
> I would prefer if the psy model is not full of #if AAC or if(aac)

for now that's the only implementation
Can you suggest something more clean? 

> >     int i;
> >     int q;
> >     float coef, unquant, sum = 0.0f;
> >     for(i = 0; i < size; i++){
> >         coef = fabs(c[i]);
> >         q = quant(c[i], Q);
> >         unquant = (q * cbrt(q)) * IQ;
> >         sum += (coef - unquant) * (coef - unquant);
> >     }
> >     return sum * 1.0 / 512.0;
> > }
> > 
> 
> > //XXX: stub
> > static inline int psy_aac_get_approximate_bits(const float *c, int size, const float Q)
> > {
> >     int i, bits = 0;
> >     for(i = 0; i < size; i += 2){
> >         int idx = 0, j, q;
> >         for(j = 0; j < 2; j++){
> >             q = quant(c[i+j], Q);
> >             q = FFABS(q);
> >             if(q)
> >                 bits++;
> >             if(q > 16)
> >                 bits += av_log2(q)*2 - 4 + 1;
> >             idx = idx*17 + FFMIN(q, 16);
> >         }
> >         bits += ff_aac_spectral_bits[10][idx];
> >     }
> >     return bits;
> > }
> 
> this does not belong in the psy model.
> Different numbers of bits do not sound differently,
> besides format specific things could be callbacks if they are needed
> 
> [...]
> 
> > /**
> >  * Calculate Bark value for given line.
> >  */
> > static inline float calc_bark(float f)
> > {
> >     return 13.3f * atanf(0.00076f * f) + 3.5f * atanf((f / 7500.0f) * (f / 7500.0f));
> > }
> 
> this is not speed critical rather the oppossite, it should be av_cold
> it is used only during init
> 
> 
> > 
> > #define ATH_ADD 4
> > /**
> >  * Calculate ATH value for given frequency.
> >  * Borrowed from Lame.
> >  */
> > static inline float ath(float f, float add)
> > {
> >     f /= 1000.0f;
> >     return   3.64 * pow(f, -0.8)
> >             - 6.8  * exp(-0.6  * (f - 3.4) * (f - 3.4))
> >             + 6.0  * exp(-0.15 * (f - 8.7) * (f - 8.7))
> >             + (0.6 + 0.04 * add) * 0.001 * f * f * f * f;
> > }
> 
> same
> 
> 
> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> It is dangerous to be right in matters on which the established authorities
> are wrong. -- Voltaire