[FFmpeg-devel] Nellymoser encoder
Michael Niedermayer
michaelni
Sun Aug 31 15:53:23 CEST 2008
On Sun, Aug 31, 2008 at 01:07:15PM +0200, Bartlomiej Wolowiec wrote:
> Saturday 30 August 2008 18:10:41 Michael Niedermayer napisa?(a):
> > On Sat, Aug 30, 2008 at 03:42:37PM +0200, Bartlomiej Wolowiec wrote:
> > > Friday 29 August 2008 22:36:10 Michael Niedermayer napisa?(a):
> > > > > > > > > +
> > > > > > > > > +void apply_mdct(NellyMoserEncodeContext *s, float *in, float
> > > > > > > > > *coefs) +{
> > > > > > > > > + DECLARE_ALIGNED_16(float, in_buff[NELLY_SAMPLES]);
> > > > > > > > > +
> > > > > > > > > + memcpy(&in_buff[0], &in[0], NELLY_SAMPLES *
> > > > > > > > > sizeof(float)); + s->dsp.vector_fmul(in_buff, ff_sine_128,
> > > > > > > > > NELLY_BUF_LEN); + s->dsp.vector_fmul_reverse(in_buff +
> > > > > > > > > NELLY_BUF_LEN, in_buff + NELLY_BUF_LEN, ff_sine_128,
> > > > > > > > > NELLY_BUF_LEN); +
> > > > > > > > > ff_mdct_calc(&s->mdct_ctx, coefs, in_buff);
> > > > > > > > > +}
> > > > > > > >
> > > > > > > > The data is copied once in encode_frame and twice here
> > > > > > > > There is no need to copy the data 3 times.
> > > > > > > > vector_fmul can be used with a singl memcpy to get the data
> > > > > > > > into any destination, and vector_fmul_reverse doesnt even need
> > > > > > > > 1 memcpy, so overall a single memcpy is enough
> > > > > > >
> > > > > > > Hope that you meant something similar to my solution.
> > > > > >
> > > > > > no, you still do 2 memcpy() but now the code is really messy as
> > > > > > well.
> > > > > >
> > > > > > what you should do is, for each block of samples you get from the
> > > > > > user 1. apply one half of the window onto it with
> > > > > > vector_fmul_reverse and destination of some internal buffer
> > > > > > 2. memcpy into the 2nd destination and apply the other half of the
> > > > > > window onto it with vector_fmul
> > > > > > 3. run the mdct as appropriate on the internal buffers.
> > > > >
> > > > > Hmm, I considered it, but I don't understand exactly what should I
> > > > > change... In the code I copy data two times:
> > > > > a) in encode_frame - I convert int16_t to float and copy data to
> > > > > s->buf - I need to do it somewhere because vector_mul requires float
> > > > > *. Additionally, part of the data is needed to the next call of
> > > > > encode_frame b) in apply_mdct - here I think that some additional
> > > > > part of buffer is needed. If I understood correctly I have to get rid
> > > > > of a), but how to get access to old data when the next call of
> > > > > encode_frame is performed and how call vector_fmul on int16_t?
> > > >
> > > > have you tried setting AVCodec.sample_fmts to SAMPLE_FMT_FLT ?
> > > > I think ffmpeg should support this already. If it does not work then we
> > > > can keep int16 for now which would implicate more copying
> > >
> > > Hmm... I tried to use SAMPLE_FMT_FLT, but something doesn't work. I made
> > > only that changes:
> > >
> > > float *samples = data;
> > > ...
> > > for (i = 0; i < avctx->frame_size; i++) {
> > > s->buf[s->bufsel][i] = samples[i]*(1<<15);
> > > }
> > > ...
> > > .sample_fmts = (enum SampleFormat[]){SAMPLE_FMT_FLT,SAMPLE_FMT_NONE},
> >
> > hmm
>
> Any idea? or should I leave it as it is?
does PCM float work for you? if so what is the difference to your encoder?
[...]
>
> > An alternative would be to instead of keeping N pathes that are closest
> > to the current power_candidate, keep the N so far overall best pathes,
> > thats what adpcm.c does and it likely has better quality/per speed
> > but is harder to implement
> >
> > > Maybe, only relying on this idea, it will be better to make possible
> > > transition to e.g. 3 best states? (but personally I hardly see here a
> > > connection with viterbi...).
> >
> > optimized viterbi :)
>
> In enclosed patch there is an attempt of writing dynamic allocation of
> exponents. It isn't ideal, but I want to be sure if I don't go wrong. It's
> quite slow - and I don't know if it can be accelerated more than 2-3 times.?
> Because it's so slow and quality is often not really better -
:(
i feared that a little, as we are only trying to match some guessed values
better, we dont really know which pows are best.
So if its not usefull for improving quality then i guess theres not much
point in optimizing it much ...
> I suggest to
> leave both versions -
ok
> how to allow user to choose ?
AVCodecContext.trellis or AVCodecContext.compression_level seems like
possible choices
[...]
> Index: nellymoserenc.c
> ===================================================================
> --- nellymoserenc.c (wersja 15050)
> +++ nellymoserenc.c (kopia robocza)
> @@ -45,9 +45,19 @@
> #define POW_TABLE_SIZE (1<<11)
> #define POW_TABLE_OFFSET 3
>
> +#undef NDEBUG
> +#include <assert.h>
> +
> typedef struct NellyMoserEncodeContext {
> AVCodecContext *avctx;
> int last_frame;
> + int bufsel;
> + int have_saved;
> + int better_quality;
> + DSPContext dsp;
> + MDCTContext mdct_ctx;
ok
[..]
> @@ -110,6 +136,13 @@
> return -1;
> }
>
> + if (avctx->sample_rate != 8000 && avctx->sample_rate != 11025 &&
> + avctx->sample_rate != 22050 && avctx->sample_rate != 44100 &&
> + avctx->strict_std_compliance >= FF_COMPLIANCE_NORMAL) {
> + av_log(avctx, AV_LOG_ERROR, "Nellymoser works only with 8000, 11025, 22050 and 44100 sample rate\n");
> + return -1;
> + }
> +
> avctx->frame_size = NELLY_SAMPLES;
> s->avctx = avctx;
> ff_mdct_init(&s->mdct_ctx, 8, 0);
ok
[...]
> @@ -131,6 +165,218 @@
> return 0;
> }
>
> +#define find_best(val, table, LUT, LUT_add, LUT_size) \
> + best_idx = \
> + LUT[av_clip ((lrintf(val) >> 8) + LUT_add, 0, LUT_size - 1)]; \
> + if (fabs(val - table[best_idx]) > fabs(val - table[best_idx + 1])) \
> + best_idx++;
> +
ok
> +static void get_exponent_greedy(NellyMoserEncodeContext *s, float *cand, int *idx_table)
> +{
> + int band, best_idx, power_idx = 0;
> + float power_candidate;
> + for (band = 0; band < NELLY_BANDS; band++) {
> + if (band) {
> + power_candidate = cand[band] - power_idx;
> + find_best(power_candidate, ff_nelly_delta_table, sf_delta_lut, 37, 78);
> + idx_table[band] = best_idx;
> + power_idx += ff_nelly_delta_table[best_idx];
> + } else {
> + //base exponent
> + find_best(cand[0], ff_nelly_init_table, sf_lut, -20, 96);
> + idx_table[0] = best_idx;
> + power_idx = ff_nelly_init_table[best_idx];
> + }
> + }
//base exponent
find_best(cand[0], ff_nelly_init_table, sf_lut, -20, 96);
idx_table[0] = best_idx;
power_idx = ff_nelly_init_table[best_idx];
for (band = 1; band < NELLY_BANDS; band++) {
....
}
> +}
> +
> +#define OPT_SIZE ((1<<15) + 3000)
> +
> +inline float distance(float x, float y, int band)
static inline
> +{
> + //return pow(fabs(x-y), 2.0);
> + float tmp = x - y;
> + return tmp * tmp;
> +}
> +
> +static void get_exponent_dynamic(NellyMoserEncodeContext *s, float *cand, int *idx_table)
> +{
> + int i, j, band, best_idx;
> + float power_candidate, best_val;
> +
> + float opt[NELLY_BANDS][OPT_SIZE];
> + int path[NELLY_BANDS][OPT_SIZE];
> +
> + for (i = 0; i < NELLY_BANDS * OPT_SIZE; i++) {
> + opt[0][i] = INFINITY;
> + }
> +
> + for (i = 0; i < 64; i++) {
> + opt[0][ff_nelly_init_table[i]] = distance(cand[0], ff_nelly_init_table[i], 0);
> + path[0][ff_nelly_init_table[i]] = i;
> + }
> +
> + for (band = 1; band < NELLY_BANDS; band++) {
> + int q, c = 0;
> + float tmp;
> + int idx_min, idx_max, idx;
> + power_candidate = cand[band];
> + for (q = 1000; !c && q < OPT_SIZE; q <<= 2) {
> + idx_min = FFMAX(0, cand[band] - q);
> + idx_max = FFMIN(OPT_SIZE, cand[band - 1] + q);
> + for (i = FFMAX(0, cand[band - 1] - q); i < FFMIN(OPT_SIZE, cand[band - 1] + q); i++) {
> + for (j = 0; j < 32; j++) {
> + idx = i + ff_nelly_delta_table[j];
> + if (idx > idx_max)
> + break;
> + if (idx >= idx_min && isfinite(opt[band - 1][i])) {
the isfinite check can be moved outside of the for (j = 0 loop
[...]
> +/**
> + * Encodes NELLY_SAMPLES samples. It assumes, that samples contains 3 * NELLY_BUF_LEN values
> + * @param s encoder context
> + * @param output output buffer
> + * @param output_size size of output buffer
> + */
> +static void encode_block(NellyMoserEncodeContext *s, unsigned char *output, int output_size)
> +{
> + PutBitContext pb;
> + int i, j, band, block, best_idx, power_idx = 0;
> + float power_val, coeff, coeff_sum;
> + float pows[NELLY_FILL_LEN];
> + int bits[NELLY_BUF_LEN], idx_table[NELLY_BANDS];
> + float cand[NELLY_BANDS];
> +
> + const float C = 1.0;
> + const float D = 3.0;
any reason why you changed these? Do these sound better?
> +
> + apply_mdct(s);
> +
> + init_put_bits(&pb, output, output_size * 8);
> +
> + i = 0;
> + for (band = 0; band < NELLY_BANDS; band++) {
> + coeff_sum = 0;
> + for (j = 0; j < ff_nelly_band_sizes_table[band]; i++, j++) {
> + //coeff_sum += s->mdct_out[i ] * s->mdct_out[i ]
> + // + s->mdct_out[i + NELLY_BUF_LEN] * s->mdct_out[i + NELLY_BUF_LEN];
> + coeff_sum += pow(fabs(s->mdct_out[i]), D) + pow(fabs(s->mdct_out[i + NELLY_BUF_LEN]), D);
> + }
> + cand[band] =
> + //log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / M_LN2;
> + C * log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / log(D);
the MAX should maybe be done after the correction for D
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
While the State exists there can be no freedom; when there is freedom there
will be no State. -- Vladimir Lenin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080831/9f41da06/attachment.pgp>
More information about the ffmpeg-devel
mailing list