[FFmpeg-devel] [PATCH] G.729A (now fixed-point) decoder
Vladimir Voroshilov
voroshil
Mon Apr 14 01:56:34 CEST 2008
On Mon, Apr 14, 2008 at 5:35 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Wed, Mar 26, 2008 at 11:23:04PM +0600, Vladimir Voroshilov wrote:
[...]
> > I'm not sure about one-line routines like l_add, l_mac, l_msu, etc
> > Should i convert them to macro ?
>
> no, but we need a proof for each individual use of them that the cliping
> is needed. If you do not have a file which needs it and cant provide valid
> input values that need the cliping then it has to be removed or replaced
> by assert()
> Also i would like to see some comments for each int64 and clip which
> explains under what circumstances they are needed and which of the test
> vectors needs it.
Something like that for l_add:
"Using 64bit arithmetic to avoid overflow in 32bit domain before
clipping. This also can be done in 32bit arithmetic but with a bit
more complex code."
Or you meant something else?
> >
> > There were also made many changes in misc parts of code to get
> > bit-exact result. Thus, i'm afraid, complete review from zero point is
> > required.
>
> Review below is just partial, i thought its better than letting you wait
> until GSOC qualifications end. Ill look at your code again after the
> clip/int64 stuff has been cleaned up.
Well... While waiting for review, i've started implementing G.729D
(G.729 @6.4kHz)
and now about 80% is finished (only huge long-term postfilter remaining).
After a week or two work should be completed.
As an additional advantage there will be full G.729 decoder (instead
of simplified G.729A).
and (as separate patch) standard quality VOC demuxer/decoder (media
format with "VCP162_VOC_File" tag in header).
Due to complexity, post filtering code will require several review roundup.
Several routines were also changes to avoid code duplication.
If you wish, i can post current code here to start review process as
soon as possible and allow fixing issues in common code in parallel
with finishing post filter.
> > > > > > +/**
> > > > > > + * \brief compensates the tilt in the short-term postfilter (4.2.3)
> > > > >
> > > > > > + * \param ctx private data structure
> > > > > > + * \param lp_gn (Q12) coefficients of A(z/GAMMA_N) filter
> > > > > > + * \param lp_gd (Q12) coefficients of A(z/GAMMA_D) filter
> > > > > > + * \param res_pst [in/out] (Q0) residual signal (partially filtered)
> > > > > > +*/
> > > > >
> > > > > > +static void g729a_tilt_compensation(G729A_Context *ctx, const int16_t *lp_gn, const int16_t *lp_gd, int16_t* res_pst)
> > > > > > +{
> > > > > > + int tmp;
> > > > > > + int gt; // Q12
> > > > > > + int rh1,rh0; // Q12
> > > > > > + int16_t hf_buf[11+22]; // Q12 A(Z/GAMMA_N)/A(z/GAMMA_D) filter impulse response
> > > > > > + int sum;
> > > > > > + int i, n;
> > > > > > +
> > > > > > + memset(hf_buf, 0, 33 * sizeof(int16_t));
> > > > > > +
> > > > >
> > > > > > + hf_buf[10] = 4096; //1.0 in Q12
> > > > >
> > > > > > + for(i=0; i<10; i++)
> > > > > > + hf_buf[i+11] = lp_gn[i];
> > > > > > +
> > > > > > + /* Apply 1/A(z/GAMMA_D) filter to hf */
> > > > > > + for(n=0; n<22; n++)
> > > > > > + {
> > > > > > + sum=hf_buf[n+10];
> > > > >
> > > > > > + for(i=0; i<10; i++)
> > > > > > + sum -= (lp_gd[i] * hf_buf[n+10-i-1]) >> 12;
> > > > > > + hf_buf[n+10] = sum;
> > > > > > + }
> > > > >
> > > > > for(n=0; n<22; n++){
> > > > > sum= lp_gn[n];
> > > > >
> > > > > for(i=0; i<10; i++)
> > > > > sum -= (lp_gd[i] * hf_buf[n+10-i-1]) >> 12;
> > > > > hf_buf[n+10] = sum;
> > > > > }
> > > >
> > > > Wrong. Out of bounds in lp_gn
> > > > Or you mean extend lp_gn to 22 items ?
> > > > I don't see here any advantage.
> > >
> > > less memsetting and copying IIRC
> >
> > with increased complexity of code and much worse readability, IMHO
> > Keeping as is, until i implement clean solution (if so)
>
> Hmm, have you read the development policy and patch checklist for ffmpeg?
> Code submitted must be optimal, i would not hesitate to reject the whole
> patch because it does a unneeded copy or memset somewhere.
I just meant that i'll keep it during patch review roundup for a
while, since don't have clean/working solution for now.
This does not mean that i want to see patch commited ignoring policy
or developers' opinions.
> Thats also why iam a little annoyed by the clip() and int64_t. They are
> slow unles the compiler is smart (= not gcc) and they smell avoidable.
I'll recheck all of them once again, but afraid that all are required
for bit-exact result.
If anybody suggest 32bit alternatives i'll use them thankfully :)
> > +#define SHARP_MAX 13017 //0.8 in Q14
>
> typo, 0.8 is 13107 not 13017
Surprise! This is not typo (at least not mine). This constant (as long
as comment) was got from reference code. Replacing 13017 with 13107
breaks bit-exact result.
> > +/**
> > + * \brief shift to left with check for overflow
> > + * \param value shifted value
> > + * \param shift size of shifting (can be negative)
> > + *
> > + * \return value shifted to left by given shift value and clipped to [INT_MIN; INT_MAX]
> > + *
> > + * \remark if shift is negative, routine will do right
> > + * shift by absolute value of given parameter
> > + */
> > +static int l_shl(int value, int shift)
> > +{
> > + if(shift < 0)
> > + return value >> -shift;
> > +
> > + if(value > (INT_MAX >> shift))
> > + return INT_MAX;
> > +
> > + if(value < (INT_MIN >> shift))
> > + return INT_MIN;
> > +
> > + return value << shift;
> > +}
>
> many uses of this always have value>=0 many can only have shift being >=0
> thus this is not optimal
So i should replace this routine with appropriate in-place checks
(for shift sign and clipping), rigth? Note, that main goal of this
routine is clipping. Having negative shift check just makes code a bit
shorter.
What about having splitting code into l_shl without shift sign check
and l_shift (calling l_shl internally) with it ? And use l_shift only
when necessary.
> > +/**
> > + * \brief divides one 16-bit integer by another
> > + * \param num numenator
> > + * \param denom denominator
> > + *
> > + * \return result of num/denom in Q15
> > + *
> > + * \note num and denom has to be positive and
> > + * denom must be greater or equal to num
> > + *
> > + * \note if num == denom reoutine returns SHRT_MAX
> > + */
> > +static int16_t div_int16(int16_t num, int16_t denom)
> > +{
> > + assert(denom>=num && num>=0);
> > +
> > + if (!num)
> > + return 0;
> > +
> > + if (num == denom)
> > + return SHRT_MAX;
> > +
> > + return (num << 15)/denom;
> > +}
>
> SHRT_MAX totally is wrong here
> also the if(!num) test is redundant
Hm. bit-exact test passed. i'll remove both.
> > +/**
> > + * \brief Calculates sum of array elements multiplications
> > + * \param speech array with input data
> > + * \param cycles number elements to proceed
> > + * \param offset offset for calculation sum of s[i]*s[i+offset]
> > + * \param shift right shift by this value will be done before multiplication
> > + *
> > + * \return sum of multiplications
> > + *
> > + * \note array must be at least length+offset long!
> > + */
> > +static int sum_of_squares(const int16_t* speech, int cycles, int offset, int shift)
> > +{
> > + int n;
> > + int sum = 0;
> > +
> > + for(n=0; n<cycles; n++)
> > + sum = l_mac(sum, speech[n] >> shift, speech[n + offset] >> shift);
> > +
> > + return av_clip(sum, INT_MIN >> 1, INT_MAX >> 1);
> > +}
>
> all uses of INT_MAX/MIN are likely wrong in your code there is no reason why
> INT_MAX couldnt be 1<<60 which is not what your code expects
What about having separated #define Q15_MIN/MAX, Q12_MIN/MAX instead ?
--
Regards,
Vladimir Voroshilov mailto:voroshil at gmail.com
JID: voroshil at gmail.com, voroshil at jabber.ru
ICQ: 95587719
More information about the ffmpeg-devel
mailing list