[FFmpeg-devel] [PATCH] Add a G.722 encoder
Michael Niedermayer
michaelni
Wed Sep 15 23:53:28 CEST 2010
On Wed, Sep 15, 2010 at 05:42:39PM +0300, Martin Storsj? wrote:
> On Wed, 15 Sep 2010, Michael Niedermayer wrote:
>
> > On Wed, Sep 15, 2010 at 12:41:34PM +0300, Martin Storsj? wrote:
> > > On Tue, 14 Sep 2010, Michael Niedermayer wrote:
> > >
> > > > > + int index = diff >= 0 ? (diff < pred) + 2 : diff >= -pred;
> > > > > +
> > > > > + update_high_predictor(&c->band[1], c->band[1].scale_factor *
> > > > > + high_inv_quant[index] >> 10, index);
> > > > > + return index;
> > > > > +}
> > > > > +
> > > > > +static inline int encode_low(const struct G722Band* state, int xlow)
> > > > > +{
> > > > > + int diff = av_clip_int16(xlow - state->s_predictor);
> > > > > + int limit = diff >= 0 ? diff : -(diff + 1);
> > > >
> > > > > + int i = 0;
> > > > > + while (i < 29 && limit >= (low_quant[i] * state->scale_factor) >> 10)
> > > > > + i++;
> > > >
> > > > that doesnt look efficient
> > > > limit >= (low_quant[i] * state->scale_factor) >> 10)
> > > > can be changed to
> > > > C > low_quant[i]
> > >
> > > Hmm, do you mean something like this?
> > >
> > > limit = (limit << 10) / state->scale_factor;
> > > while (i < 29 && limit >= low_quant[i])
> > > i++;
> >
> > something like this yes, but with correct rounding
> > you could also try to just get rid of the >>
> > (limit+1)<<10 > low_quant[i] * state->scale_factor
> >
> > to get the rounding correct use your brain and try to exactly move each
> > operation from one side of the >= to the other
> >
> >
> > >
> > > This makes the results differ slightly from the reference test vectors
> > > (which perhaps in itself is acceptable, but a bitexact mode to test
> > > against the reference may still be useful). This actually made it slower,
> > >
> > > 1015 dezicycles in encode_low, 1048518 runs, 58 skips
> > >
> > > vs
> > >
> > > 963 dezicycles in encode_low, 1048494 runs, 82 skips
> >
> > as the result differs we dont know if the loop runs the same number of
> > iterations but its possible of course the the division is too slow for
> > this to work out
> >
> >
> > >
> > > initially.
> > >
> > > > also a LUT could be tried if this matters speed wise
> > >
> > > Hmm, what would this LUT contain? The output depends both on the current
> > > 16-bit diff and the current scale factor, so one single 64k LUT isn't
> > > enough.
> >
> > as said this is equivalent to C > low_quant[i] you just have to calculate C
> > exactly and not assume these are real values from math lectures, they are
> > integers and >> and / can round down
> >
> > also the i<29 can likely be removed by making sure the table contains an
> > appropriate value at the end
>
> I managed to do this, so I've got the following two versions:
>
> limit = limit + 1 << 10;
> while (limit > low_quant[i] * state->scale_factor)
>
> 885 dezicycles in encode_low, 1047991 runs, 585 skips
>
> and
>
> limit = limit + 1 << 10;
> limit = (limit + state->scale_factor - 1) / state->scale_factor;
> while (limit > low_quant[i])
>
> 988 dezicycles in encode_low, 1047886 runs, 690 skips
>
> So the former version is a bit faster, even if it does more
> multiplications.
>
> Removing the i < 29 check with a sentinel works fine for normal samples,
> but the reference test vectors fail on this if using the former version,
> since the numeric range of 32 bit numbers isn't enough.
>
> I.e., low_quant[i] * scale_factor mustn't overflow, so the the sentinel
> must be INT_MAX/max_scale_factor, which is (2^31-1)/4096, and limit << 10
> can be larger than this.
multiple sentinels should then work
>
> So this version is the fastest that passes the regression tests for me:
>
> limit = limit + 1 << 10;
> while (i < 29 && limit > low_quant[i] * state->scale_factor)
> 913 dezicycles in encode_low, 1048522 runs, 54 skips
>
> > > +static inline int encode_high(G722Context *c, int xhigh)
> > > +{
> > > + int diff = av_clip_int16(xhigh - c->band[1].s_predictor);
> > > + int pred = 141 * c->band[1].scale_factor >> 8;
> > > + int index = diff >= 0 ? (diff < pred) + 2 : diff >= -pred;
> > > +
> > > + update_high_predictor(&c->band[1], c->band[1].scale_factor *
> > > + high_inv_quant[index] >> 10, index);
> > > + return index;
> > > +}
> > > +
> > > +static inline int encode_low(const struct G722Band* state, int xlow)
> > > +{
> > > + int diff = av_clip_int16(xlow - state->s_predictor);
> >
> > > + int limit = diff >= 0 ? diff : -(diff + 1);
> >
> > thats
> > diff ^ (diff>>31)
> > i think
>
> Yes, thanks, fixed.
>
> > and maybe that can also be used in encode_high() to get rid of the branch
>
> I don't see how the branches in encode_high could be avoided thanks to
> this - the only branches there map index into 0, 1, 2 or 3 depending on if
> diff >= 0 and diff < pred.
something like:
((diff ^ (diff>>31)) < pred) + 2*(diff>=0)
might work, no idea about speed
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100915/162d99c3/attachment.pgp>
More information about the ffmpeg-devel
mailing list