[FFmpeg-devel] [PATCH] G.729A (now fixed-point) decoder

Mon Apr 14 01:56:34 CEST 2008

On Mon, Apr 14, 2008 at 5:35 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Wed, Mar 26, 2008 at 11:23:04PM +0600, Vladimir Voroshilov wrote:

[...]

>  > I'm not sure about one-line routines like l_add, l_mac, l_msu, etc
>  > Should i convert them to macro ?
>
>  no, but we need a proof for each individual use of them that the cliping
>  is needed. If you do not have a file which needs it and cant provide valid
>  input values that need the cliping then it has to be removed or replaced
>  by assert()
>  Also i would like to see some comments for each int64 and clip which
>  explains under what circumstances they are needed and which of the test
>  vectors needs it.

Something like that for l_add:
"Using 64bit arithmetic to avoid overflow in 32bit domain before
clipping. This also can be done in 32bit arithmetic but with a bit
more complex code."

Or you meant something else?

>  >
>  > There were also made many changes in misc parts of code to get
>  > bit-exact result. Thus, i'm afraid, complete review from zero point is
>  > required.
>
>  Review below is just partial, i thought its better than letting you wait
>  until GSOC qualifications end. Ill look at your code again after the
>  clip/int64 stuff has been cleaned up.

Well... While waiting for review, i've started implementing G.729D
(G.729 @6.4kHz)
and now about 80% is finished (only huge long-term postfilter remaining).
After a week or two work should be completed.
As an additional advantage there will be full G.729 decoder (instead
of simplified G.729A).
and (as separate patch) standard quality VOC demuxer/decoder (media
format with "VCP162_VOC_File" tag in header).

Due to complexity, post filtering code will require several review roundup.
Several routines were also changes to avoid code duplication.
If you wish, i can post current code here to start review process as
soon as possible and allow fixing issues in common code in parallel
with finishing post filter.

> > >  > >  > +/**
>  > >  > >  > + * \brief compensates the tilt in the short-term postfilter (4.2.3)
>  > >  > >
>  > >  > > > + * \param ctx private data structure
>  > >  > >  > + * \param lp_gn (Q12) coefficients of A(z/GAMMA_N) filter
>  > >  > >  > + * \param lp_gd (Q12) coefficients of A(z/GAMMA_D) filter
>  > >  > >  > + * \param res_pst [in/out] (Q0) residual signal (partially filtered)
>  > >  > >  > +*/
>  > >  > >
>  > >  > > > +static void g729a_tilt_compensation(G729A_Context *ctx, const int16_t *lp_gn, const int16_t *lp_gd, int16_t* res_pst)
>  > >  > >  > +{
>  > >  > >  > +    int tmp;
>  > >  > >  > +    int gt;      // Q12
>  > >  > >  > +    int rh1,rh0; // Q12
>  > >  > >  > +    int16_t hf_buf[11+22]; // Q12 A(Z/GAMMA_N)/A(z/GAMMA_D) filter impulse response
>  > >  > >  > +    int sum;
>  > >  > >  > +    int i, n;
>  > >  > >  > +
>  > >  > >  > +    memset(hf_buf, 0, 33 * sizeof(int16_t));
>  > >  > >  > +
>  > >  > >
>  > >  > >  > +    hf_buf[10] = 4096; //1.0 in Q12
>  > >  > >
>  > >  > > > +    for(i=0; i<10; i++)
>  > >  > >  > +        hf_buf[i+11] = lp_gn[i];
>  > >  > >  > +
>  > >  > >  > +    /* Apply 1/A(z/GAMMA_D) filter to hf */
>  > >  > >  > +    for(n=0; n<22; n++)
>  > >  > >  > +    {
>  > >  > >  > +        sum=hf_buf[n+10];
>  > >  > >
>  > >  > > > +        for(i=0; i<10; i++)
>  > >  > >  > +            sum -= (lp_gd[i] * hf_buf[n+10-i-1]) >> 12;
>  > >  > >  > +        hf_buf[n+10] = sum;
>  > >  > >  > +    }
>  > >  > >
>  > >  > >  for(n=0; n<22; n++){
>  > >  > >     sum= lp_gn[n];
>  > >  > >
>  > >  > >     for(i=0; i<10; i++)
>  > >  > >         sum -= (lp_gd[i] * hf_buf[n+10-i-1]) >> 12;
>  > >  > >     hf_buf[n+10] = sum;
>  > >  > >  }
>  > >  >
>  > >  > Wrong. Out of bounds in lp_gn
>  > >  > Or you mean extend lp_gn to 22 items ?
>  > >  > I don't see here any advantage.
>  > >
>  > >  less memsetting and copying IIRC
>  >
>  > with increased complexity of code and much worse readability, IMHO
>  > Keeping as is, until i implement clean solution (if so)
>
>  Hmm, have you read the development policy and patch checklist for ffmpeg?
>  Code submitted must be optimal, i would not hesitate to reject the whole
>  patch because it does a unneeded copy or memset somewhere.

I just meant that i'll keep it during patch review roundup for a
while, since don't have clean/working solution for now.
This does not mean that i want to see patch commited ignoring policy
or developers' opinions.

>  Thats also why iam a little annoyed by the clip() and int64_t. They are
>  slow unles the compiler is smart (= not gcc) and they smell avoidable.

I'll recheck all of them once again, but afraid that all are required
for bit-exact result.
If anybody suggest 32bit alternatives i'll use them thankfully :)

>  > +#define SHARP_MAX 13017 //0.8 in Q14
>
>  typo, 0.8 is 13107 not 13017

Surprise! This is not typo (at least not mine). This constant (as long
as comment) was got from reference code. Replacing 13017 with 13107
breaks bit-exact result.

>  > +/**
>  > + * \brief shift to left with check for overflow
>  > + * \param value shifted value
>  > + * \param shift size of shifting (can be negative)
>  > + *
>  > + * \return value shifted to left by given shift value and clipped to [INT_MIN; INT_MAX]
>  > + *
>  > + * \remark if shift is negative, routine will do right
>  > + *         shift by absolute value of given parameter
>  > + */
>  > +static int l_shl(int value, int shift)
>  > +{
>  > +    if(shift < 0)
>  > +        return value >> -shift;
>  > +
>  > +    if(value > (INT_MAX >> shift))
>  > +        return INT_MAX;
>  > +
>  > +    if(value < (INT_MIN >> shift))
>  > +        return INT_MIN;
>  > +
>  > +    return value << shift;
>  > +}
>
>  many uses of this always have value>=0 many can only have shift being >=0
>  thus this is not optimal

So i should replace this routine with appropriate  in-place checks
(for shift sign and clipping), rigth? Note, that main goal of this
routine is clipping. Having negative shift check just makes code a bit
shorter.
What about having splitting code into l_shl without shift sign check
and l_shift (calling l_shl internally) with it ? And use l_shift only
when necessary.

>  > +/**
>  > + * \brief divides one 16-bit integer by another
>  > + * \param num numenator
>  > + * \param denom denominator
>  > + *
>  > + * \return result of num/denom in Q15
>  > + *
>  > + * \note num and denom has to be positive and
>  > + *       denom must be greater or equal to num
>  > + *
>  > + * \note if num == denom reoutine returns SHRT_MAX
>  > + */
>  > +static int16_t div_int16(int16_t num, int16_t denom)
>  > +{
>  > +    assert(denom>=num && num>=0);
>  > +
>  > +    if (!num)
>  > +        return 0;
>  > +
>  > +    if (num == denom)
>  > +        return SHRT_MAX;
>  > +
>  > +    return (num << 15)/denom;
>  > +}
>

>  SHRT_MAX totally is wrong here
>  also the if(!num) test is redundant

Hm. bit-exact test passed. i'll remove both.

> > +/**
>  > + * \brief Calculates sum of array elements multiplications
>  > + * \param speech array with input data
>  > + * \param cycles number elements to proceed
>  > + * \param offset offset for calculation sum of s[i]*s[i+offset]
>  > + * \param shift right shift by this value will be done before multiplication
>  > + *
>  > + * \return sum of multiplications
>  > + *
>  > + * \note array must be at least length+offset long!
>  > + */
>  > +static int sum_of_squares(const int16_t* speech, int cycles, int offset, int shift)
>  > +{
>  > +    int n;
>  > +    int sum = 0;
>  > +
>  > +    for(n=0; n<cycles; n++)
>  > +       sum = l_mac(sum, speech[n] >> shift, speech[n + offset] >> shift);
>  > +
>  > +    return av_clip(sum, INT_MIN >> 1, INT_MAX >> 1);
>  > +}
>
>  all uses of INT_MAX/MIN are likely wrong in your code there is no reason why
>  INT_MAX couldnt be 1<<60 which is not what your code expects

What about having separated #define Q15_MIN/MAX, Q12_MIN/MAX instead ?

-- 
Regards,
Vladimir Voroshilov mailto:voroshil at gmail.com
JID: voroshil at gmail.com, voroshil at jabber.ru
ICQ: 95587719