[FFmpeg-devel] [RFC] Generic psychoacoustic model interface
Kostya
kostya.shishkov
Sat Aug 30 18:28:29 CEST 2008
On Sat, Aug 30, 2008 at 04:51:10PM +0200, Michael Niedermayer wrote:
> On Sat, Aug 30, 2008 at 01:21:54PM +0300, Kostya wrote:
> > On Thu, Aug 28, 2008 at 10:36:57PM +0200, Michael Niedermayer wrote:
> > > On Thu, Aug 28, 2008 at 08:10:26PM +0300, Kostya wrote:
> > [...]
> > > > /**
> > > > * windowing related information
> > > > */
> > > > typedef struct FFWindowInfo{
> > >
> > > > int window_type[2]; ///< window type (short/long/transitional, etc.) - current and previous
> > >
> > > How is this "transitional" going to work with many different frame lengths?
> > > is there 1? N*N ?
> >
> > that's for AAC (i.e. requires a bit of different windowing),
> > encoder will set that to internal value
>
> I think the psy model should not bother with what a specific format may or
> may not do or need.
> There are short blocks, and there are long blocks in AAC, furthermore AAC
> is restricted to have short blocks in consecutive multiplies of 8. Other
> codecs do not have such restrictions.
> Also if AAC needs to specially mark long blocks before and after short
> ones that is the problem of the AAC encoder, not the psy model.
> The window shape of a block surely depends on the next and previous block,
> that is not AAC specific.
would it better to store elsewhere or just introduce next window type?
I think with introducing next window type it would be obvious what
transition type we have.
> >
> > [...]
> > > > /**
> > > > * Get psychoacoustic model suggestion about coding two bands as M/S
> > > > */
> > > > enum FFPsyMSDecision ff_psy_suggest_ms(FFPsyContext *ctx, FFPsyBand *left, FFPsyBand *right);
> > >
> > > iam a little unsure about this one, but iam not objecting ...
> >
> > dropped for now, may revive later
> >
> > Here's another draft - it's psychoacoustic model interface with
> > partial implementation (there are some inaccuracies and debugs there,
> > but's this is RFC, not a final patch).
> >
> > I plan to use it this way with my encoder.
> >
> > General flow:
> >
>
> > init
> > while(frame){
> > suggest window()
> > [encoder may ignore that]
> > set band info() = calculate thresholds for all bands with provided window type
>
> so far i have no objections
>
>
> > psy analyze() = get distortions and weight for band quantized with a series of
> > quantizers, my encoder will use that for RD-aware quantization
>
> the distortion is only known after the RD "aware" quantization, the weight
> is needed before RD "aware" quantization, so iam somewhat confused by what
> you suggest
from the paper I've read ("Cascaded Trellis-Based Rate-Distortion Control
Algorithm for MPEG-4 Advanced Audio Coding" aka 01621212.pdf),
it is suggested to calculate optimum quantizer from costs
C = quant_distortion / threshold + lambda * bits
so model tries to calculate those for further Viterbi search
> > }
> >
>
> [...]
> > /**
> > * single band psychoacoustic information
> > */
> > typedef struct FFPsyBand{
> > int bits;
> > float energy;
> > float threshold;
> > float distortion;
> > float perceptual_weight;
> > }FFPsyBand;
>
> It should be possible to provide perceptual_weight per coefficient instead
> of per band in the future.
it's easy by adding new function to do that
> [...]
> > #ifdef ENABLE_AAC_ENCODER
> > #include "aac.h"
> > #include "aactab.h"
> >
> > /**
> > * Quantize one coefficient.
> > * @return absolute value of the quantized coefficient
> > * @see 3GPP TS26.403 5.6.2 "Scalefactor determination"
> > */
> > static av_always_inline int quant(float coef, const float Q)
> > {
> > return av_clip((int)(pow(fabsf(coef) * Q, 0.75) + 0.4054), 0, 8191);
> > }
> >
> > static inline float psy_aac_get_approximate_quant_error(const float *c, int size,
> > const float Q, const float IQ)
> > {
>
> I would prefer if the psy model is not full of #if AAC or if(aac)
for now that's the only implementation
Can you suggest something more clean?
> > int i;
> > int q;
> > float coef, unquant, sum = 0.0f;
> > for(i = 0; i < size; i++){
> > coef = fabs(c[i]);
> > q = quant(c[i], Q);
> > unquant = (q * cbrt(q)) * IQ;
> > sum += (coef - unquant) * (coef - unquant);
> > }
> > return sum * 1.0 / 512.0;
> > }
> >
>
> > //XXX: stub
> > static inline int psy_aac_get_approximate_bits(const float *c, int size, const float Q)
> > {
> > int i, bits = 0;
> > for(i = 0; i < size; i += 2){
> > int idx = 0, j, q;
> > for(j = 0; j < 2; j++){
> > q = quant(c[i+j], Q);
> > q = FFABS(q);
> > if(q)
> > bits++;
> > if(q > 16)
> > bits += av_log2(q)*2 - 4 + 1;
> > idx = idx*17 + FFMIN(q, 16);
> > }
> > bits += ff_aac_spectral_bits[10][idx];
> > }
> > return bits;
> > }
>
> this does not belong in the psy model.
> Different numbers of bits do not sound differently,
> besides format specific things could be callbacks if they are needed
>
> [...]
>
> > /**
> > * Calculate Bark value for given line.
> > */
> > static inline float calc_bark(float f)
> > {
> > return 13.3f * atanf(0.00076f * f) + 3.5f * atanf((f / 7500.0f) * (f / 7500.0f));
> > }
>
> this is not speed critical rather the oppossite, it should be av_cold
> it is used only during init
>
>
> >
> > #define ATH_ADD 4
> > /**
> > * Calculate ATH value for given frequency.
> > * Borrowed from Lame.
> > */
> > static inline float ath(float f, float add)
> > {
> > f /= 1000.0f;
> > return 3.64 * pow(f, -0.8)
> > - 6.8 * exp(-0.6 * (f - 3.4) * (f - 3.4))
> > + 6.0 * exp(-0.15 * (f - 8.7) * (f - 8.7))
> > + (0.6 + 0.04 * add) * 0.001 * f * f * f * f;
> > }
>
> same
>
>
> [...]
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> It is dangerous to be right in matters on which the established authorities
> are wrong. -- Voltaire
More information about the ffmpeg-devel
mailing list