[Ffmpeg-devel] Fixed point arithmetic RealAudio G2 (cook) decoder

Thu Mar 8 22:28:18 CET 2007

Hi

On Thu, Mar 08, 2007 at 02:16:28PM +0100, Ian Braithwaite wrote:
> Hi,
> 
> 
> OK - here's my fixed point patch for the cook audio decoder.
> 
> 
> >> I know that Rich is always interested in integer implementation for
> >> his slow K6-III+. You're gonna make at least one user happy ;-)
> >
> > not only rich, 25x speedup for ARM is interresting, though we of course
> > cannot unconditionally replace the float decoder so the patch must be
> > clean to be accepted in svn, clean hear means that common code
> > is not duplicated and that both float and integer can be selected by the
> > users (at compile time is ok)
> 
> Just to be clear - the fixed point decoder is _slower_ than the floating
> point version, by about a factor of two, on my Athlon.
> So, no, replacing the float decoder isn't an option.
> 
> The 25x speedup on ARM is perhaps not so suprising, since to run the float
> version, it has to use a software FPU emulator.
> 
> 
> To avoid duplication of common code, here's what I did.
> 
> The current decoder consists of two files, "cook.c" and "cookdata.h".
> I factored out the floating point math from "cook.c", and put it into
> inline functions in a new file, "cook_float.h".
> Then I implemented fixed point versions, and put them in "cook_fixpoint.h".
> Then I repeated the same trick for "cookdata.h".
> Finally, I conditionally include the appropriate two files in "cook.c".
> There's a define in "cook.c" to choose between float and fixed -
> presumably this could be integrated into the configuration system.
> 
> 
> Some of the patch isn't directly related to the fixed point implementation,
> so I've split it up into 4 (in my opinion!) improvements, and then the
> patch itself. The improvements came about by me figuring out how
> the whole thing works.
> 
> Here's a run down of the 5 patches.

first id like to say that Benjamin Larsson is cook maintainer and his oppinion
matters not mine, none the less my comments are below

> 
> 
> Patch 1.
> Don't output the first two frames, since they don't contain valid audio.
> This also eases comparison of decoded output with Real's binary decoder.
> 
>  cook.c |    3 +++
>  1 file changed, 3 insertions(+)

looks ok

[...]

> Patch 2.
> Simplify gain block handling.
> No effect on output.
> 
>  cook.c |  152
> ++++++++++++++++++++++-------------------------------------------
>  1 file changed, 52 insertions(+), 100 deletions(-)

looks ok

[...]

> 
> 
> Patch 3.
> Replace custom modified discrete cosine transform with ffmpeg's own.
> This does alter the decoded output, although not to my ears.
> The new output is closer to that produced by Real's "official" decoder,
> and the decoder runs slightly faster.
> 
>  cook.c |   98
> +++++++++++++++++++----------------------------------------------
>  1 file changed, 30 insertions(+), 68 deletions(-)

[...]
> +static void cook_imlt(COOKContext *q, float* inbuffer, float* outbuffer)
> +{
> +    float gain = sqrt(2.0 / q->samples_per_channel);
>      int i;
>  
[...]
> +    ff_imdct_calc(&q->mdct_ctx, outbuffer, inbuffer, q->mdct_tmp);
> +
> +    for(i = 0; i < q->samples_per_channel; i++){
> +        float tmp = outbuffer[i];
> +        
> +        outbuffer[i] =
> +          q->mlt_window[i] * gain * outbuffer[q->samples_per_channel + i];
> +        outbuffer[q->samples_per_channel + i] =
> +          q->mlt_window[q->samples_per_channel - 1 - i] * gain * -tmp;
>      }

the multiplication with gain can be merged into the window (or possibly also
into some other thing ...)

[...]

> 
> 
> Patch 4.
> Simplify subband coefficient handling.
> No effect on output.
> 
>  cook.c |   95
> +++++++++++++++++++++++++++++------------------------------------
>  1 file changed, 43 insertions(+), 52 deletions(-)
[...]

>  static int unpack_SQVH(COOKContext *q, int category, int* subband_coef_index,
> -                       int* subband_coef_noise) {
> +                       int* subband_coef_sign) {
>      int i,j;
>      int vlc, vd ,tmp, result;
>      int ub;
> @@ -600,13 +593,13 @@ static int unpack_SQVH(COOKContext *q, i
>          for(j=0 ; j<vd ; j++){
>              if (subband_coef_index[i*vd + j]) {
>                  if(get_bits_count(&q->gb) < q->bits_per_subpacket){
> -                    subband_coef_noise[i*vd+j] = get_bits1(&q->gb);
> +                    subband_coef_sign[i*vd+j] = get_bits1(&q->gb);
>                  } else {
>                      result=1;
> -                    subband_coef_noise[i*vd+j]=0;
> +                    subband_coef_sign[i*vd+j]=0;
>                  }
>              } else {
> -                subband_coef_noise[i*vd+j]=0;
> +                subband_coef_sign[i*vd+j]=0;
>              }

cosmetic change

[...]

> 
> 
> Patch 5.
> Fixed point decoder.
> A define in "cook.c" chooses between fixed and floating point
> implementations.
> With floating point, no efect on output.
> With fixed point, output difference is about 0.25rms, max +-1, at 16bit.
> 
>  cook.c              |  279 +++++---------------------
>  cook_fixp_mdct.h    |  545
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  cook_fixpoint.h     |  243 +++++++++++++++++++++++
>  cook_float.h        |  226 +++++++++++++++++++++
>  cookdata.h          |   66 ------
>  cookdata_fixpoint.h |  433 +++++++++++++++++++++++++++++++++++++++++
>  cookdata_float.h    |  113 ++++++++++
>  7 files changed, 1614 insertions(+), 291 deletions(-)

[...]
> +static const FIXPU sincos_lookup[2050] = {
> +    /* x_i = 2^16 sin(i 2pi/8192), 2^16 cos(i 2pi/8192); i=0..1024 */

i think such large tables should be generated instead of hardcoded

[...]
> diff -upN -x 'ff*' coef/cookdata_float.h fixpoint/cookdata_float.h
> --- coef/cookdata_float.h	1970-01-01 01:00:00.000000000 +0100
> +++ fixpoint/cookdata_float.h	2007-03-08 13:28:17.000000000 +0100

this must be diffed against cookdata.h or whereever the tables came from

[not reviewed code]

[...]
> diff -upN -x 'ff*' coef/cook_fixp_mdct.h fixpoint/cook_fixp_mdct.h
> --- coef/cook_fixp_mdct.h	1970-01-01 01:00:00.000000000 +0100
> +++ fixpoint/cook_fixp_mdct.h	2007-03-08 11:10:13.000000000 +0100

adding a generic fixpoint mdct to lavc MUST be a seperate patch&mail&commit
(this part also doesnt fall under benjamins maintainership but mine)

[...]

> +/**
> + * Fixed point multiply by power of two.
> + *
> + * @param x                     fix point value
> + * @param i                     integer power-of-two, -31..+31
> + */
> +static inline FIXP fixp_pow2(FIXP x, int i)
> +{
> +  if (i < 0)
> +    return (x >> -i) + ((x >> (-i-1)) & 1);
> +  else
> +    return x << i;              /* no check for overflow */
> +}

for left shift id simply use << not fixp_pow2()

for right shift with rounding id use (x + (1<<(i-1)))>>i
the advantage here is that if i is known this simplifies to (x + C)>>C2
or is there any case where the direction of the shift is not known?

> +
> +/**
> + * Fixed point multiply by fraction.
> + *
> + * @param a                     fix point value
> + * @param b                     fix point fraction, 0 <= b < 1
> + */
> +static inline FIXP fixp_mult_su(FIXP a, FIXPU b)
> +{
> +    int32_t hb = (a >> 16) * b;
> +    uint32_t lb = (a & 0xffff) * b;
> +
> +    return hb + (lb >> 16) + ((lb & 0x8000) >> 15);

return ((int64_t)a * b + 0x8000) >> 16;

(its the compiers job to optimize this and if it fails such code
should be in ASM under appropriate #ifdef, the above may be optimal
for ARM but certainly not for every CPU)

also its probably worth trying to do this without correct rounding ...

> +}
> +
> +
> +/**
> + * The real requantization of the mltcoefs
> + *
> + * @param q                     pointer to the COOKContext
> + * @param index                 index
> + * @param quant_index           quantisation index for this band
> + * @param subband_coef_index    array of indexes to quant_centroid_tab
> + * @param subband_coef_sign     use random noise instead of predetermined value
> + * @param mlt_ptr               pointer to the mlt coefficients
> + */
> +static void scalar_dequant_math(COOKContext *q, int index,
> +                                int quant_index, int* subband_coef_index,
> +                                int* subband_coef_sign, REAL_T *mlt_p)
> +{
> +    /* Num. half bits to right shift */
> +    const int s = 33 - quant_index + av_log2(q->samples_per_channel);
> +    const FIXP *table = quant_tables[s & 1][index];
> +    FIXP f;
> +    int i;
> +
> +    for(i=0 ; i<SUBBAND_SIZE ; i++) {
> +        f = table[subband_coef_index[i]];
> +        /* noise coding if subband_coef_index[i] == 0 */
> +        if (((subband_coef_index[i] == 0) && cook_random(q)) ||
> +            ((subband_coef_index[i] != 0) && subband_coef_sign[i]))
> +            f = -f;
> +        mlt_p[i] = (s >= 64) ? 0 : fixp_pow2(f, -(s/2));
> +    }

if(s>=64){
    memset(mlt_p, 0, ...)
    return;
}
assuming this case is possible at all

[...]
> +/**
> + * the actual requantization of the timedomain samples
> + *
> + * @param q                 pointer to the COOKContext
> + * @param buffer            pointer to the timedomain buffer
> + * @param gain_index        index for the block multiplier
> + * @param gain_index_next   index for the next block multiplier
> + */
> +static inline void
> +interpolate_math(COOKContext *q, FIXP* buffer,
> +                 int gain_index, int gain_index_next)
> +{
> +    int i;
> +    int gain_size_factor = q->samples_per_channel / 8;
> +
> +    if(gain_index == gain_index_next){              //static gain
> +        for(i = 0; i < gain_size_factor; i++) {
> +            buffer[i] = fixp_pow2(buffer[i], gain_index);
> +        }
> +    } else {                                        //smooth gain
> +        int step = (gain_index_next - gain_index)
> +                   << (7 - av_log2(gain_size_factor));
> +        int x = 0;
> +
> +        for(i = 0; i < gain_size_factor; i++) {
> +            buffer[i] = fixp_mult_su(buffer[i], pow128_tab[x]);
> +            buffer[i] = fixp_pow2(buffer[i], gain_index+1);
> +
> +            x += step;
> +            gain_index += (x + 128) / 128 - 1;
> +            x = (x + 128) % 128;
> +        }

gain_index += x>>7;
x&= 127

also fixp_mult_su() already does a shift so the fixp_pow2 can maybe
be merged into that

[...]
> +/**
> + * Final converion from floating point values to
> + * signed, 16 bit sound samples. Round and clip.
> + *
> + * @param q                 pointer to the COOKContext
> + * @param out               pointer to the output buffer
> + * @param chan              0: left or single channel, 1: right channel
> + */
> +static inline void output_math(COOKContext *q, int16_t *out, int chan)
> +{
> +    int j;
> +
> +    for (j = 0; j < q->samples_per_channel; j++) {
> +        out[chan + q->nb_channels * j] =
> +          av_clip(fixp_pow2(q->mono_mdct_output[j], -11), -32768, 32767);

for(j = 0; j < q->samples_per_channel; j++) {
    int v= q->mono_mdct_output[j] >> 11;
    if(v+32768U > 65535U) v= (v>>31) ^ 0x7FFF;
    *out= v;
    out += q->nb_channels;
}

> +    }
> +}
> diff -upN -x 'ff*' coef/cook_float.h fixpoint/cook_float.h
> --- coef/cook_float.h	1970-01-01 01:00:00.000000000 +0100
> +++ fixpoint/cook_float.h	2007-03-08 10:36:24.000000000 +0100

this should have been diffed against cook.c, it cannot be reviewed otherwise
as the code is from cook.c

[not reviewed code]

> +
> +
> +/**
> + * Final converion from floating point values to
> + * signed, 16 bit sound samples. Round and clip.
> + *
> + * @param q                 pointer to the COOKContext
> + * @param out               pointer to the output buffer
> + * @param chan              0: left or single channel, 1: right channel
> + */
> +static inline void output_math(COOKContext *q, int16_t *out, int chan)
> +{
> +    int j;
> +
> +    for (j = 0; j < q->samples_per_channel; j++) {
> +        out[chan + q->nb_channels * j] =
> +          av_clip(lrintf(q->mono_mdct_output[j]), -32768, 32767);
> +    }
> +}

DSPContext.float_to_int16()

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070308/da4e38b9/attachment.pgp>