[FFmpeg-devel] Nellymoser encoder

Sun Aug 31 15:53:23 CEST 2008

On Sun, Aug 31, 2008 at 01:07:15PM +0200, Bartlomiej Wolowiec wrote:
> Saturday 30 August 2008 18:10:41 Michael Niedermayer napisa?(a):
> > On Sat, Aug 30, 2008 at 03:42:37PM +0200, Bartlomiej Wolowiec wrote:
> > > Friday 29 August 2008 22:36:10 Michael Niedermayer napisa?(a):
> > > > > > > > > +
> > > > > > > > > +void apply_mdct(NellyMoserEncodeContext *s, float *in, float
> > > > > > > > > *coefs) +{
> > > > > > > > > +    DECLARE_ALIGNED_16(float, in_buff[NELLY_SAMPLES]);
> > > > > > > > > +
> > > > > > > > > +    memcpy(&in_buff[0], &in[0], NELLY_SAMPLES *
> > > > > > > > > sizeof(float)); +    s->dsp.vector_fmul(in_buff, ff_sine_128,
> > > > > > > > > NELLY_BUF_LEN); +    s->dsp.vector_fmul_reverse(in_buff +
> > > > > > > > > NELLY_BUF_LEN, in_buff + NELLY_BUF_LEN, ff_sine_128,
> > > > > > > > > NELLY_BUF_LEN); +
> > > > > > > > > ff_mdct_calc(&s->mdct_ctx, coefs, in_buff);
> > > > > > > > > +}
> > > > > > > >
> > > > > > > > The data is copied once in encode_frame and twice here
> > > > > > > > There is no need to copy the data 3 times.
> > > > > > > > vector_fmul can be used with a singl memcpy to get the data
> > > > > > > > into any destination, and vector_fmul_reverse doesnt even need
> > > > > > > > 1 memcpy, so overall a single memcpy is enough
> > > > > > >
> > > > > > > Hope that you meant something similar to my solution.
> > > > > >
> > > > > > no, you still do 2 memcpy() but now the code is really messy as
> > > > > > well.
> > > > > >
> > > > > > what you should do is, for each block of samples you get from the
> > > > > > user 1. apply one half of the window onto it with
> > > > > > vector_fmul_reverse and destination of some internal buffer
> > > > > > 2. memcpy into the 2nd destination and apply the other half of the
> > > > > >    window onto it with vector_fmul
> > > > > > 3. run the mdct as appropriate on the internal buffers.
> > > > >
> > > > > Hmm, I considered it, but I don't understand exactly what should I
> > > > > change... In the code I copy data two times:
> > > > > a) in encode_frame - I convert int16_t to float and copy data to
> > > > > s->buf - I need to do it somewhere because vector_mul requires float
> > > > > *. Additionally, part of the data is needed to the next call of
> > > > > encode_frame b) in apply_mdct - here I think that some additional
> > > > > part of buffer is needed. If I understood correctly I have to get rid
> > > > > of a), but how to get access to old data when the next call of
> > > > > encode_frame is performed and how call vector_fmul on int16_t?
> > > >
> > > > have you tried setting AVCodec.sample_fmts to SAMPLE_FMT_FLT ?
> > > > I think ffmpeg should support this already. If it does not work then we
> > > > can keep int16 for now which would implicate more copying
> > >
> > > Hmm... I tried to use SAMPLE_FMT_FLT, but something doesn't work. I made
> > > only that changes:
> > >
> > > float *samples = data;
> > > ...
> > > for (i = 0; i < avctx->frame_size; i++) {
> > >     s->buf[s->bufsel][i] = samples[i]*(1<<15);
> > > }
> > > ...
> > > .sample_fmts = (enum SampleFormat[]){SAMPLE_FMT_FLT,SAMPLE_FMT_NONE},
> >
> > hmm
> 
> Any idea? or should I leave it as it is?

does PCM float work for you? if so what is the difference to your encoder?

[...]
> 
> > An alternative would be to instead of keeping N pathes that are closest
> > to the current power_candidate, keep the N so far overall best pathes,
> > thats what adpcm.c does and it likely has better quality/per speed
> > but is harder to implement
> >
> > > Maybe, only relying on this idea, it will be better to make possible
> > > transition to e.g. 3 best states? (but personally I hardly see here a
> > > connection with viterbi...).
> >
> > optimized viterbi :)
> 
> In enclosed patch there is an attempt of writing dynamic allocation of 
> exponents. It isn't ideal, but I want to be sure if I don't go wrong. It's 
> quite slow - and I don't know if it can be accelerated more than 2-3 times.? 

> Because it's so slow and quality is often not really better -

:(
i feared that a little, as we are only trying to match some guessed values
better, we dont really know which pows are best.
So if its not usefull for improving quality then i guess theres not much
point in optimizing it much ...

>  I suggest to 
> leave both versions -

ok

> how to allow user to choose ?

AVCodecContext.trellis or AVCodecContext.compression_level seems like
possible choices

[...]
> Index: nellymoserenc.c
> ===================================================================
> --- nellymoserenc.c	(wersja 15050)
> +++ nellymoserenc.c	(kopia robocza)
> @@ -45,9 +45,19 @@
>  #define POW_TABLE_SIZE (1<<11)
>  #define POW_TABLE_OFFSET 3
>  
> +#undef NDEBUG
> +#include <assert.h>
> +
>  typedef struct NellyMoserEncodeContext {
>      AVCodecContext  *avctx;
>      int             last_frame;
> +    int             bufsel;
> +    int             have_saved;
> +    int             better_quality;

> +    DSPContext      dsp;
> +    MDCTContext     mdct_ctx;

ok

[..]
> @@ -110,6 +136,13 @@
>          return -1;
>      }
>  
> +    if (avctx->sample_rate != 8000 && avctx->sample_rate != 11025 &&
> +        avctx->sample_rate != 22050 && avctx->sample_rate != 44100 &&
> +        avctx->strict_std_compliance >= FF_COMPLIANCE_NORMAL) {
> +        av_log(avctx, AV_LOG_ERROR, "Nellymoser works only with 8000, 11025, 22050 and 44100 sample rate\n");
> +        return -1;
> +    }
> +
>      avctx->frame_size = NELLY_SAMPLES;
>      s->avctx = avctx;
>      ff_mdct_init(&s->mdct_ctx, 8, 0);

ok

[...]

> @@ -131,6 +165,218 @@
>      return 0;
>  }
>  
> +#define find_best(val, table, LUT, LUT_add, LUT_size) \
> +    best_idx = \
> +        LUT[av_clip ((lrintf(val) >> 8) + LUT_add, 0, LUT_size - 1)]; \
> +    if (fabs(val - table[best_idx]) > fabs(val - table[best_idx + 1])) \
> +        best_idx++;
> +

ok

> +static void get_exponent_greedy(NellyMoserEncodeContext *s, float *cand, int *idx_table)
> +{
> +    int band, best_idx, power_idx = 0;
> +    float power_candidate;
> +    for (band = 0; band < NELLY_BANDS; band++) {
> +        if (band) {
> +            power_candidate = cand[band] - power_idx;
> +            find_best(power_candidate, ff_nelly_delta_table, sf_delta_lut, 37, 78);
> +            idx_table[band] = best_idx;
> +            power_idx += ff_nelly_delta_table[best_idx];
> +        } else {
> +            //base exponent
> +            find_best(cand[0], ff_nelly_init_table, sf_lut, -20, 96);
> +            idx_table[0] = best_idx;
> +            power_idx = ff_nelly_init_table[best_idx];
> +        }
> +    }

//base exponent
find_best(cand[0], ff_nelly_init_table, sf_lut, -20, 96);
idx_table[0] = best_idx;
power_idx = ff_nelly_init_table[best_idx];

for (band = 1; band < NELLY_BANDS; band++) {
    ....
}

> +}
> +
> +#define OPT_SIZE ((1<<15) + 3000)
> +

> +inline float distance(float x, float y, int band)

static inline

> +{
> +    //return pow(fabs(x-y), 2.0);
> +    float tmp = x - y;
> +    return tmp * tmp;
> +}
> +
> +static void get_exponent_dynamic(NellyMoserEncodeContext *s, float *cand, int *idx_table)
> +{
> +    int i, j, band, best_idx;
> +    float power_candidate, best_val;
> +
> +    float opt[NELLY_BANDS][OPT_SIZE];
> +    int path[NELLY_BANDS][OPT_SIZE];
> +
> +    for (i = 0; i < NELLY_BANDS * OPT_SIZE; i++) {
> +        opt[0][i] = INFINITY;
> +    }
> +
> +    for (i = 0; i < 64; i++) {
> +        opt[0][ff_nelly_init_table[i]] = distance(cand[0], ff_nelly_init_table[i], 0);
> +        path[0][ff_nelly_init_table[i]] = i;
> +    }
> +
> +    for (band = 1; band < NELLY_BANDS; band++) {
> +        int q, c = 0;
> +        float tmp;
> +        int idx_min, idx_max, idx;
> +        power_candidate = cand[band];
> +        for (q = 1000; !c && q < OPT_SIZE; q <<= 2) {
> +            idx_min = FFMAX(0, cand[band] - q);
> +            idx_max = FFMIN(OPT_SIZE, cand[band - 1] + q);
> +            for (i = FFMAX(0, cand[band - 1] - q); i < FFMIN(OPT_SIZE, cand[band - 1] + q); i++) {
> +                for (j = 0; j < 32; j++) {
> +                    idx = i + ff_nelly_delta_table[j];
> +                    if (idx > idx_max)
> +                        break;

> +                    if (idx >= idx_min && isfinite(opt[band - 1][i])) {

the isfinite check can be moved outside of the for (j = 0 loop

[...]

> +/**
> + * Encodes NELLY_SAMPLES samples. It assumes, that samples contains 3 * NELLY_BUF_LEN values
> + *  @param s               encoder context
> + *  @param output          output buffer
> + *  @param output_size     size of output buffer
> + */
> +static void encode_block(NellyMoserEncodeContext *s, unsigned char *output, int output_size)
> +{
> +    PutBitContext pb;
> +    int i, j, band, block, best_idx, power_idx = 0;
> +    float power_val, coeff, coeff_sum;
> +    float pows[NELLY_FILL_LEN];
> +    int bits[NELLY_BUF_LEN], idx_table[NELLY_BANDS];
> +    float cand[NELLY_BANDS];
> +

> +    const float C = 1.0;
> +    const float D = 3.0;

any reason why you changed these? Do these sound better?

> +
> +    apply_mdct(s);
> +

> +    init_put_bits(&pb, output, output_size * 8);
> +
> +    i = 0;
> +    for (band = 0; band < NELLY_BANDS; band++) {
> +        coeff_sum = 0;
> +        for (j = 0; j < ff_nelly_band_sizes_table[band]; i++, j++) {
> +            //coeff_sum += s->mdct_out[i                ] * s->mdct_out[i                ]
> +            //           + s->mdct_out[i + NELLY_BUF_LEN] * s->mdct_out[i + NELLY_BUF_LEN];
> +            coeff_sum += pow(fabs(s->mdct_out[i]), D) + pow(fabs(s->mdct_out[i + NELLY_BUF_LEN]), D);
> +        }
> +        cand[band] =
> +            //log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / M_LN2;
> +            C * log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / log(D);

the MAX should maybe be done after the correction for D

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

While the State exists there can be no freedom; when there is freedom there
will be no State. -- Vladimir Lenin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080831/9f41da06/attachment.pgp>