[FFmpeg-devel] [PATCH] Add a G.722 encoder

Wed Sep 15 23:53:28 CEST 2010

On Wed, Sep 15, 2010 at 05:42:39PM +0300, Martin Storsj? wrote:
> On Wed, 15 Sep 2010, Michael Niedermayer wrote:
> 
> > On Wed, Sep 15, 2010 at 12:41:34PM +0300, Martin Storsj? wrote:
> > > On Tue, 14 Sep 2010, Michael Niedermayer wrote:
> > > 
> > > > > +    int index = diff >= 0 ? (diff < pred) + 2 : diff >= -pred;
> > > > > +
> > > > > +    update_high_predictor(&c->band[1], c->band[1].scale_factor *
> > > > > +                          high_inv_quant[index] >> 10, index);
> > > > > +    return index;
> > > > > +}
> > > > > +
> > > > > +static inline int encode_low(const struct G722Band* state, int xlow)
> > > > > +{
> > > > > +    int diff  = av_clip_int16(xlow - state->s_predictor);
> > > > > +    int limit = diff >= 0 ? diff : -(diff + 1);
> > > > 
> > > > > +    int i = 0;
> > > > > +    while (i < 29 && limit >= (low_quant[i] * state->scale_factor) >> 10)
> > > > > +        i++;
> > > > 
> > > > that doesnt look efficient
> > > > limit >= (low_quant[i] * state->scale_factor) >> 10)
> > > > can be changed to
> > > > C > low_quant[i]
> > > 
> > > Hmm, do you mean something like this?
> > > 
> > >     limit = (limit << 10) / state->scale_factor;
> > >     while (i < 29 && limit >= low_quant[i])
> > >         i++;
> > 
> > something like this yes, but with correct rounding
> > you could also try to just get rid of the >>
> > (limit+1)<<10 > low_quant[i] * state->scale_factor
> > 
> > to get the rounding correct use your brain and try to exactly move each
> > operation from one side of the >= to the other
> > 
> > 
> > > 
> > > This makes the results differ slightly from the reference test vectors 
> > > (which perhaps in itself is acceptable, but a bitexact mode to test 
> > > against the reference may still be useful). This actually made it slower,
> > > 
> > > 1015 dezicycles in encode_low, 1048518 runs, 58 skips
> > > 
> > > vs
> > > 
> > > 963 dezicycles in encode_low, 1048494 runs, 82 skips
> > 
> > as the result differs we dont know if the loop runs the same number of
> > iterations but its possible of course the the division is too slow for
> > this to work out
> > 
> > 
> > > 
> > > initially.
> > > 
> > > > also a LUT could be tried if this matters speed wise
> > > 
> > > Hmm, what would this LUT contain? The output depends both on the current 
> > > 16-bit diff and the current scale factor, so one single 64k LUT isn't 
> > > enough.
> > 
> > as said this is equivalent to  C > low_quant[i] you just have to calculate C
> > exactly and not assume these are real values from math lectures, they are
> > integers and >> and / can round down
> > 
> > also the i<29 can likely be removed by making sure the table contains an
> > appropriate value at the end
> 
> I managed to do this, so I've got the following two versions:
> 
>     limit = limit + 1 << 10;
>     while (limit > low_quant[i] * state->scale_factor)
> 
> 885 dezicycles in encode_low, 1047991 runs, 585 skips
> 
> and
> 
>     limit = limit + 1 << 10;
>     limit = (limit + state->scale_factor - 1) / state->scale_factor;
>     while (limit > low_quant[i])
> 
> 988 dezicycles in encode_low, 1047886 runs, 690 skips
> 
> So the former version is a bit faster, even if it does more 
> multiplications.
> 
> Removing the i < 29 check with a sentinel works fine for normal samples, 
> but the reference test vectors fail on this if using the former version, 
> since the numeric range of 32 bit numbers isn't enough.
> 
> I.e., low_quant[i] * scale_factor mustn't overflow, so the the sentinel 
> must be INT_MAX/max_scale_factor, which is (2^31-1)/4096, and limit << 10 
> can be larger than this.

multiple sentinels should then work


> 
> So this version is the fastest that passes the regression tests for me:
> 
>     limit = limit + 1 << 10;
>     while (i < 29 && limit > low_quant[i] * state->scale_factor)
> 913 dezicycles in encode_low, 1048522 runs, 54 skips
> 
> > > +static inline int encode_high(G722Context *c, int xhigh)
> > > +{
> > > +    int diff = av_clip_int16(xhigh - c->band[1].s_predictor);
> > > +    int pred = 141 * c->band[1].scale_factor >> 8;
> > > +    int index = diff >= 0 ? (diff < pred) + 2 : diff >= -pred;
> > > +
> > > +    update_high_predictor(&c->band[1], c->band[1].scale_factor *
> > > +                          high_inv_quant[index] >> 10, index);
> > > +    return index;
> > > +}
> > > +
> > > +static inline int encode_low(const struct G722Band* state, int xlow)
> > > +{
> > > +    int diff  = av_clip_int16(xlow - state->s_predictor);
> > 
> > > +    int limit = diff >= 0 ? diff : -(diff + 1);
> > 
> > thats
> > diff ^ (diff>>31)
> > i think
> 
> Yes, thanks, fixed.
> 
> > and maybe that can also be used in encode_high() to get rid of the branch
> 
> I don't see how the branches in encode_high could be avoided thanks to 
> this - the only branches there map index into 0, 1, 2 or 3 depending on if 
> diff >= 0 and diff < pred.

something like:
((diff ^ (diff>>31)) < pred) + 2*(diff>=0)
might work, no idea about speed

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100915/162d99c3/attachment.pgp>