[FFmpeg-devel] [PATCH] Add a G.722 encoder

Martin Storsjö martin
Wed Sep 15 16:42:39 CEST 2010


On Wed, 15 Sep 2010, Michael Niedermayer wrote:

> On Wed, Sep 15, 2010 at 12:41:34PM +0300, Martin Storsj? wrote:
> > On Tue, 14 Sep 2010, Michael Niedermayer wrote:
> > 
> > > > +    int index = diff >= 0 ? (diff < pred) + 2 : diff >= -pred;
> > > > +
> > > > +    update_high_predictor(&c->band[1], c->band[1].scale_factor *
> > > > +                          high_inv_quant[index] >> 10, index);
> > > > +    return index;
> > > > +}
> > > > +
> > > > +static inline int encode_low(const struct G722Band* state, int xlow)
> > > > +{
> > > > +    int diff  = av_clip_int16(xlow - state->s_predictor);
> > > > +    int limit = diff >= 0 ? diff : -(diff + 1);
> > > 
> > > > +    int i = 0;
> > > > +    while (i < 29 && limit >= (low_quant[i] * state->scale_factor) >> 10)
> > > > +        i++;
> > > 
> > > that doesnt look efficient
> > > limit >= (low_quant[i] * state->scale_factor) >> 10)
> > > can be changed to
> > > C > low_quant[i]
> > 
> > Hmm, do you mean something like this?
> > 
> >     limit = (limit << 10) / state->scale_factor;
> >     while (i < 29 && limit >= low_quant[i])
> >         i++;
> 
> something like this yes, but with correct rounding
> you could also try to just get rid of the >>
> (limit+1)<<10 > low_quant[i] * state->scale_factor
> 
> to get the rounding correct use your brain and try to exactly move each
> operation from one side of the >= to the other
> 
> 
> > 
> > This makes the results differ slightly from the reference test vectors 
> > (which perhaps in itself is acceptable, but a bitexact mode to test 
> > against the reference may still be useful). This actually made it slower,
> > 
> > 1015 dezicycles in encode_low, 1048518 runs, 58 skips
> > 
> > vs
> > 
> > 963 dezicycles in encode_low, 1048494 runs, 82 skips
> 
> as the result differs we dont know if the loop runs the same number of
> iterations but its possible of course the the division is too slow for
> this to work out
> 
> 
> > 
> > initially.
> > 
> > > also a LUT could be tried if this matters speed wise
> > 
> > Hmm, what would this LUT contain? The output depends both on the current 
> > 16-bit diff and the current scale factor, so one single 64k LUT isn't 
> > enough.
> 
> as said this is equivalent to  C > low_quant[i] you just have to calculate C
> exactly and not assume these are real values from math lectures, they are
> integers and >> and / can round down
> 
> also the i<29 can likely be removed by making sure the table contains an
> appropriate value at the end

I managed to do this, so I've got the following two versions:

    limit = limit + 1 << 10;
    while (limit > low_quant[i] * state->scale_factor)

885 dezicycles in encode_low, 1047991 runs, 585 skips

and

    limit = limit + 1 << 10;
    limit = (limit + state->scale_factor - 1) / state->scale_factor;
    while (limit > low_quant[i])

988 dezicycles in encode_low, 1047886 runs, 690 skips

So the former version is a bit faster, even if it does more 
multiplications.

Removing the i < 29 check with a sentinel works fine for normal samples, 
but the reference test vectors fail on this if using the former version, 
since the numeric range of 32 bit numbers isn't enough.

I.e., low_quant[i] * scale_factor mustn't overflow, so the the sentinel 
must be INT_MAX/max_scale_factor, which is (2^31-1)/4096, and limit << 10 
can be larger than this.

So this version is the fastest that passes the regression tests for me:

    limit = limit + 1 << 10;
    while (i < 29 && limit > low_quant[i] * state->scale_factor)
913 dezicycles in encode_low, 1048522 runs, 54 skips

> > +static inline int encode_high(G722Context *c, int xhigh)
> > +{
> > +    int diff = av_clip_int16(xhigh - c->band[1].s_predictor);
> > +    int pred = 141 * c->band[1].scale_factor >> 8;
> > +    int index = diff >= 0 ? (diff < pred) + 2 : diff >= -pred;
> > +
> > +    update_high_predictor(&c->band[1], c->band[1].scale_factor *
> > +                          high_inv_quant[index] >> 10, index);
> > +    return index;
> > +}
> > +
> > +static inline int encode_low(const struct G722Band* state, int xlow)
> > +{
> > +    int diff  = av_clip_int16(xlow - state->s_predictor);
> 
> > +    int limit = diff >= 0 ? diff : -(diff + 1);
> 
> thats
> diff ^ (diff>>31)
> i think

Yes, thanks, fixed.

> and maybe that can also be used in encode_high() to get rid of the branch

I don't see how the branches in encode_high could be avoided thanks to 
this - the only branches there map index into 0, 1, 2 or 3 depending on if 
diff >= 0 and diff < pred.

> > +    for (i = 0; i < buf_size/2; i++) {
> 
> >>1
> 
> /2 with signed vars is not as fast

Fixed.

// Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Add-a-G.722-encoder.patch
Type: text/x-diff
Size: 6218 bytes
Desc: 
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100915/7516d3b4/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-Add-initial-trellis-support-in-the-G.722-encoder.patch
Type: text/x-diff
Size: 8257 bytes
Desc: 
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100915/7516d3b4/attachment-0001.patch>



More information about the ffmpeg-devel mailing list