[FFmpeg-devel] [PATCH] Common ACELP code & G.729 [3/7] - vectors operations

Sat May 3 15:35:32 CEST 2008

On Sat, May 03, 2008 at 02:53:40PM +0700, Vladimir Voroshilov wrote:
> Michael Niedermayer wrote: 
> > On Fri, May 02, 2008 at 06:43:18PM +0700, Vladimir Voroshilov wrote:
> > > This patch contains routines for operating with adaptive and algebraic vectors.
> > > 
> > > FIR filter was moved here.
> > > It now does not use any "frac=-frac" or "if(frac<9)" and can be called
> > > as foo(delay/6, delay%6).
> > > This possibility was not obvious for me due to loop asymmetry,
> > > introduced in reference code (and in specification too)
> > > to avoid "out of bounds" error.
> > 
> > [...]
> > > +static const uint8_t mul_5x[8] = {0, 5, 10, 15, 20, 25, 30, 35};
> > 
> > *5 should do :)
> 
> This was optimization attempt.
> I'll remove all such tables.
> If anyone want to optimize code,
> benchmark it, fill free to do this after commit.
> 
> [...]
> 
> > > +static uint8_t gray_decode[32] =
> > > +{
> > > +    0,  1,  3,  2,  7,  6,  4,  5,
> > > +   15, 14, 12, 13,  8,  9, 11, 10,
> > > +   31, 30, 28, 29, 24, 25, 27, 26,
> > > +   16, 17, 19, 18, 23, 22, 20, 21
> > > +};
> > 
> > this is just x^(x>>1)
> 
> Wrong. 

sorry, i mixed encoding and decoding up

> 
> 4^(4>>1) = 4^2 = 6 != 7 = gray_decode[4]
> 5^(5>>1) = 5^2 = 7 != 6 = gray_decode[5]
> and so on...
> 
> Perhaps it was intended to be so till G.729 guys touched it :)
> After seeing fc_2pulses_9bits_track2 table i'm not surpised.
> 
> 
> > of course only replace the table with above if its smaller/faster, same
> > applies to the other suggested replacements for the tables
> > 
> > 
> > [...]
> > > +void ff_acelp_fc_4pulses_13bits(
> > > +        int16_t* fc_v,
> > > +        int fc_index,
> > > +        int pulses_signs)
> > > +{
> > > +    int i;
> > > +    int index;
> > > +
> > > +    for(i=0; i<3; i++)
> > > +    {
> > > +        index = i + mul_5x[fc_index & 7];
> > > +        fc_v[ index ] += (pulses_signs & 1) ? 8191 : -8192; // +/-1 in (2.13)
> > > +
> > > +        fc_index >>= 3;
> > > +        pulses_signs >>= 1;
> > > +    }
> > > +
> > > +    index = 3 + (fc_index & 1) + mul_5x[(fc_index>>1) & 7];
> > > +    fc_v[ index ] += (pulses_signs & 1) ? 8191 : -8192;
> > > +}
> > > +
> > 
> 
> [...]
> 
> > > +void ff_acelp_fc_4pulses_21bits(
> > > +        int16_t* fc_v,
> > > +        int fc_index,
> > > +        int pulses_signs)
> > > +{
> > > +    int i;
> > > +    int index;
> > > +
> > > +    for(i=0; i<3; i++)
> > > +    {
> > > +        index = i + 5 *(fc_index & 15);
> > > +        fc_v[ index ] += (pulses_signs & 1) ? 8191 : -8192; // +/-1 in (2.13)
> > > +
> > > +        fc_index >>= 4;
> > > +        pulses_signs >>= 1;
> > > +    }
> > > +
> > > +    index = 3 + (fc_index & 1);
> > > +    fc_index >>= 1;
> > > +    index += 5 * (fc_index & 15);
> > > +    fc_v[ index ] += (pulses_signs & 1) ? 8191 : -8192;
> > > +}
> > > +
> > 
> > duplicate of ff_acelp_fc_4pulses_13bits()
> 
> Added static routine.
> I want to keep those routines because i'm using pointers to them in
> formats array.
> 

> > also the 
> > fc_v[ index ] += (pulses_signs & 1) ? 8191 : -8192;
> > stuff is duplicated all over the place
> 
> This line adds pulses to fixed vector.
> Lines are the same just because sign bits are packed together each other (one per
> index) and separately from index bits.
> There are also different packing methods (sipr @16k, amr @10.2k)
> I don't think implementing this 1 line of code via macro or inline
> routine is a good idea.

no, i meant

for(i=0; i<2 or 4; i++)
    fc_v[ index[i] ] += ((pulses_signs>>i) & 1) ? 8191 : -8192;

But i do not know yet if this is a good idea, it might turn out that its
worse after some other simplifications are done ...

[...]

> +void ff_acelp_interpolate_pitch_vector(
> +        int16_t* ac_v,
> +        int pitch_delay_int,
> +        int pitch_delay_frac,
> +        int subframe_size)
> +{
> +    int n, i;
> +    int v;
> +
> +    /* pitch_delay_frac [0; 5]
> +       pitch_delay_int  [PITCH_LAG_MIN; PITCH_LAG_MAX] */
> +    for(n=0; n<subframe_size; n++)
> +    {
> +        /* 3.7.1 of G.729, Equation 40 */
> +        v = ac_v[n - pitch_delay_int] * ff_acelp_interp_filter[0][FFABS(pitch_delay_frac-2)];
> +        for(i=1; i<11; i++)
> +        {
> +
> +            /* Reference G.729 and AMR fixed point code performs clipping after
> +               each of the two following accumulations.
> +               Since clipping affects only synthetic OVERFLOW test and still not
> +               cause int type oveflow, it was moved outside loop. */
> +
> +            /*  R(x):=ac_v[-k+x] */
> +            /* v += R(n-i)*ff_acelp_interp_filter(t+6i) */
> +            v += ac_v[n - pitch_delay_int - i] * ff_acelp_interp_filter[i][2-pitch_delay_frac];
> +            /* v += R(n+i+1)*ff_acelp_interp_filter(6-t+6i) */
> +            v += ac_v[n - pitch_delay_int + i] * ff_acelp_interp_filter[i][pitch_delay_frac-2];

Like with the other patches id like to have the comments seperated somehow
the code is quite hard to read with them intermingled like this.

> +        }
> +        ac_v[n] = av_clip_int16((v + 0x4000) >> 15);
> +    }
> +}
> +

> +/**
> + * \brief common routines for 4pulses_13bits and 4pulses_21bits
> + *
> + * Tables differs only by width, so can be handled via common routine.
> + */
> +static void ff_acelp_4pulses_13_21_bits(
> +        int16_t* fc_v,
> +        int fc_index,
> +        int pulses_signs,
> +        int bits)
> +{
> +    int mask = (1 << bits) - 1;
> +    int i, index;
> +
> +    for(i=0; i<3; i++)
> +    {
> +        index = i + 5 * (fc_index & mask);
> +        fc_v[ index ] += (pulses_signs & 1) ? 8191 : -8192; // +/-1 in (2.13)
> +
> +        fc_index >>= bits;
> +        pulses_signs >>= 1;
> +    }
> +
> +    index = 3 + (fc_index & 1) + 5 * ((fc_index>>1) & mask);
> +    fc_v[ index ] += (pulses_signs & 1) ? 8191 : -8192;
> +}
> +
[...]
> +
> +void ff_acelp_fc_2pulses_9bits_g729d(
> +        int16_t* fc_v,
> +        int fc_index,
> +        int pulses_signs)
> +{
> +    int index;
> +
> +    index = ((5 * gray_decode[fc_index & 15]) >> 1) + 1;
> +    fc_v[ index ] += (pulses_signs & 1) ? 8191 : -8192; // +/-1 in (2.13)
> +
> +    pulses_signs >>= 1;
> +    fc_index >>= 4;
> +
> +    index = fc_2pulses_9bits_track2[gray_decode[fc_index & 31]];
> +    fc_v[ index ] += (pulses_signs & 1) ? 8191 : -8192; // +/-1 in (2.13)
> +}

The only thing that seperates these is the different tables

    for(i=0; i<pulse_count; i++){
        index = i + tab[fc_index & mask];
        fc_v[ index ] += (pulses_signs & 1) ? 8191 : -8192;

        fc_index >>= bits;
        pulses_signs >>= 1;
    }

    index = tab2[fc_index];
    fc_v[ index ] += pulses_signs ? 8191 : -8192;

This also means you no longer need the function pointers.

> +
> +void ff_acelp_fc_enchance_harmonics(
> +        int16_t* fc_v,
> +        int pitch_delay,
> +        int16_t gain_pitch,
> +        int length)
> +{
> +    int i;
> +
> +    for(i=pitch_delay; i<length;i++)
> +        fc_v[i] += (fc_v[i - pitch_delay] * gain_pitch) >> 14;
> +}
> +
> +void ff_acelp_build_excitation(
> +        int16_t* exc,
> +        const int16_t *ac_v,
> +        const int16_t *fc_v,
> +        int16_t gp,
> +        int16_t gc,
> +        int subframe_size)
> +{
> +    int i;
> +
> +    //clipping required here. breaks OVERFLOW test
> +    for(i=0; i<subframe_size; i++)
> +        exc[i] = av_clip_int16((ac_v[i] * gp + fc_v[i] * gc + 0x2000) >> 14);
> +}

Does it really makes sense to put 1 line loops in functions?

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Dictatorship naturally arises out of democracy, and the most aggravated
form of tyranny and slavery out of the most extreme liberty. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080503/44a6faa3/attachment.pgp>