[FFmpeg-devel] Nellymoser encoder

Sat Aug 30 15:42:37 CEST 2008

Friday 29 August 2008 22:36:10 Michael Niedermayer napisa?(a):
> > > > > > +
> > > > > > +void apply_mdct(NellyMoserEncodeContext *s, float *in, float
> > > > > > *coefs) +{
> > > > > > +    DECLARE_ALIGNED_16(float, in_buff[NELLY_SAMPLES]);
> > > > > > +
> > > > > > +    memcpy(&in_buff[0], &in[0], NELLY_SAMPLES * sizeof(float));
> > > > > > +    s->dsp.vector_fmul(in_buff, ff_sine_128, NELLY_BUF_LEN);
> > > > > > +    s->dsp.vector_fmul_reverse(in_buff + NELLY_BUF_LEN, in_buff
> > > > > > + NELLY_BUF_LEN, ff_sine_128, NELLY_BUF_LEN); +
> > > > > > ff_mdct_calc(&s->mdct_ctx, coefs, in_buff);
> > > > > > +}
> > > > >
> > > > > The data is copied once in encode_frame and twice here
> > > > > There is no need to copy the data 3 times.
> > > > > vector_fmul can be used with a singl memcpy to get the data into
> > > > > any destination, and vector_fmul_reverse doesnt even need 1 memcpy,
> > > > > so overall a single memcpy is enough
> > > >
> > > > Hope that you meant something similar to my solution.
> > >
> > > no, you still do 2 memcpy() but now the code is really messy as well.
> > >
> > > what you should do is, for each block of samples you get from the user
> > > 1. apply one half of the window onto it with vector_fmul_reverse and
> > >    destination of some internal buffer
> > > 2. memcpy into the 2nd destination and apply the other half of the
> > >    window onto it with vector_fmul
> > > 3. run the mdct as appropriate on the internal buffers.
> >
> > Hmm, I considered it, but I don't understand exactly what should I
> > change... In the code I copy data two times:
> > a) in encode_frame - I convert int16_t to float and copy data to s->buf -
> > I need to do it somewhere because vector_mul requires float *.
> > Additionally, part of the data is needed to the next call of encode_frame
> > b) in apply_mdct - here I think that some additional part of buffer is
> > needed. If I understood correctly I have to get rid of a), but how to get
> > access to old data when the next call of encode_frame is performed and
> > how call vector_fmul on int16_t?
>
> have you tried setting AVCodec.sample_fmts to SAMPLE_FMT_FLT ?
> I think ffmpeg should support this already. If it does not work then we can
> keep int16 for now which would implicate more copying

Hmm... I tried to use SAMPLE_FMT_FLT, but something doesn't work. I made only 
that changes:

float *samples = data;
...
for (i = 0; i < avctx->frame_size; i++) {
    s->buf[s->bufsel][i] = samples[i]*(1<<15);
}
...
.sample_fmts = (enum SampleFormat[]){SAMPLE_FMT_FLT,SAMPLE_FMT_NONE},

[...]
> > +
> > +#define find_best(val, table, LUT, LUT_add, LUT_size) \
> > +    best_idx = \
> > +        LUT[av_clip ((((int)val) >> 8) + LUT_add, 0, LUT_size - 1)]; \
> > +    if (abs(val - table[best_idx]) > abs(val - table[best_idx + 1])) \
> > +        best_idx++;
>
> (int)some_float is slow, lrintf() should be faster

yes, but here rounding down is needed. lrintf(x-0.5) ?

> also if val is a float instead of an int then fabs() may actually be better
> than abs()
ok

> > +
> > +/**
> > + * Encodes NELLY_SAMPLES samples. It assumes, that samples contains 3 *
> > NELLY_BUF_LEN values + *  @param s               encoder context
> > + *  @param output          output buffer
> > + *  @param output_size     size of output buffer
> > + */
> > +static void encode_block(NellyMoserEncodeContext *s, unsigned char
> > *output, int output_size) +{
> > +    PutBitContext pb;
> > +    int i, band, block, best_idx, power_idx = 0;
> > +    float power_val, power_candidate, coeff, coeff_sum;
> > +    int band_start, band_end;
> > +    float pows[NELLY_FILL_LEN];
> > +    int bits[NELLY_BUF_LEN];
> > +
> > +    const float C = 1.0;
> > +    const float D = 2.0;
> > +
> > +    apply_mdct(s);
> > +
> > +    init_put_bits(&pb, output, output_size * 8);
> > +
> >
> > +    band_start = 0;
> > +    band_end = ff_nelly_band_sizes_table[0];
> > +    for (band = 0; band < NELLY_BANDS; band++) {
> > +        coeff_sum = 0;
> > +        for (i = band_start; i < band_end; i++) {
> > +            //coeff_sum += s->mdct_out[i                ] *
> > s->mdct_out[i                ] +            //           + s->mdct_out[i
> > + NELLY_BUF_LEN] * s->mdct_out[i + NELLY_BUF_LEN]; +            coeff_sum
> > += pow(fabs(s->mdct_out[i]), D) + pow(fabs(s->mdct_out[i +
> > NELLY_BUF_LEN]), D); +        }
> > +        power_candidate =
> > +            //log(FFMAX(1.0, coeff_sum /
> > (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / M_LN2; +            C
> > * log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) *
> > 1024.0 / log(D); +
> > +        if (band) {
> > +            power_candidate -= power_idx;
> > +            find_best(power_candidate, ff_nelly_delta_table,
> > sf_delta_lut, 37, 78); +            put_bits(&pb, 5, best_idx);
> > +            power_idx += ff_nelly_delta_table[best_idx];
> > +        } else {
> > +            //base exponent
> > +            find_best(power_candidate, ff_nelly_init_table, sf_lut, -20,
> > 96); +            put_bits(&pb, 6, best_idx);
> > +            power_idx = ff_nelly_init_table[best_idx];
> > +        }
>
> the choice of power_idx/best_idx values could still be tried to be found
> with viterbi. Its somewhat similar (and simpler) than our viterbi/trellis
> ADPCM encoder

Ok. I read some information about viterbi. 
( http://en.wikipedia.org/wiki/Viterbi_algorithm ;) ).
I see that it should be assumed:
states - values - there will be about 2^16 of them?
observations -following values of power_candidate
probabilities - e.g. |searched value - assumed value|^C
transition_probabilit - anything, e.g. 1

and write it cleverer. It seems to me, that it will be a quite slow algorithm. 
Maybe, only relying on this idea, it will be better to make possible 
transition to e.g. 3 best states? (but personally I hardly see here a 
connection with viterbi...).
-- 
Bartlomiej Wolowiec