[Ffmpeg-devel] Fixed point arithmetic RealAudio G2 (cook) decoder
Michael Niedermayer
michaelni
Thu Mar 8 22:28:18 CET 2007
Hi
On Thu, Mar 08, 2007 at 02:16:28PM +0100, Ian Braithwaite wrote:
> Hi,
>
>
> OK - here's my fixed point patch for the cook audio decoder.
>
>
> >> I know that Rich is always interested in integer implementation for
> >> his slow K6-III+. You're gonna make at least one user happy ;-)
> >
> > not only rich, 25x speedup for ARM is interresting, though we of course
> > cannot unconditionally replace the float decoder so the patch must be
> > clean to be accepted in svn, clean hear means that common code
> > is not duplicated and that both float and integer can be selected by the
> > users (at compile time is ok)
>
> Just to be clear - the fixed point decoder is _slower_ than the floating
> point version, by about a factor of two, on my Athlon.
> So, no, replacing the float decoder isn't an option.
>
> The 25x speedup on ARM is perhaps not so suprising, since to run the float
> version, it has to use a software FPU emulator.
>
>
> To avoid duplication of common code, here's what I did.
>
> The current decoder consists of two files, "cook.c" and "cookdata.h".
> I factored out the floating point math from "cook.c", and put it into
> inline functions in a new file, "cook_float.h".
> Then I implemented fixed point versions, and put them in "cook_fixpoint.h".
> Then I repeated the same trick for "cookdata.h".
> Finally, I conditionally include the appropriate two files in "cook.c".
> There's a define in "cook.c" to choose between float and fixed -
> presumably this could be integrated into the configuration system.
>
>
> Some of the patch isn't directly related to the fixed point implementation,
> so I've split it up into 4 (in my opinion!) improvements, and then the
> patch itself. The improvements came about by me figuring out how
> the whole thing works.
>
> Here's a run down of the 5 patches.
first id like to say that Benjamin Larsson is cook maintainer and his oppinion
matters not mine, none the less my comments are below
>
>
> Patch 1.
> Don't output the first two frames, since they don't contain valid audio.
> This also eases comparison of decoded output with Real's binary decoder.
>
> cook.c | 3 +++
> 1 file changed, 3 insertions(+)
looks ok
[...]
> Patch 2.
> Simplify gain block handling.
> No effect on output.
>
> cook.c | 152
> ++++++++++++++++++++++-------------------------------------------
> 1 file changed, 52 insertions(+), 100 deletions(-)
looks ok
[...]
>
>
> Patch 3.
> Replace custom modified discrete cosine transform with ffmpeg's own.
> This does alter the decoded output, although not to my ears.
> The new output is closer to that produced by Real's "official" decoder,
> and the decoder runs slightly faster.
>
> cook.c | 98
> +++++++++++++++++++----------------------------------------------
> 1 file changed, 30 insertions(+), 68 deletions(-)
[...]
> +static void cook_imlt(COOKContext *q, float* inbuffer, float* outbuffer)
> +{
> + float gain = sqrt(2.0 / q->samples_per_channel);
> int i;
>
[...]
> + ff_imdct_calc(&q->mdct_ctx, outbuffer, inbuffer, q->mdct_tmp);
> +
> + for(i = 0; i < q->samples_per_channel; i++){
> + float tmp = outbuffer[i];
> +
> + outbuffer[i] =
> + q->mlt_window[i] * gain * outbuffer[q->samples_per_channel + i];
> + outbuffer[q->samples_per_channel + i] =
> + q->mlt_window[q->samples_per_channel - 1 - i] * gain * -tmp;
> }
the multiplication with gain can be merged into the window (or possibly also
into some other thing ...)
[...]
>
>
> Patch 4.
> Simplify subband coefficient handling.
> No effect on output.
>
> cook.c | 95
> +++++++++++++++++++++++++++++------------------------------------
> 1 file changed, 43 insertions(+), 52 deletions(-)
[...]
> static int unpack_SQVH(COOKContext *q, int category, int* subband_coef_index,
> - int* subband_coef_noise) {
> + int* subband_coef_sign) {
> int i,j;
> int vlc, vd ,tmp, result;
> int ub;
> @@ -600,13 +593,13 @@ static int unpack_SQVH(COOKContext *q, i
> for(j=0 ; j<vd ; j++){
> if (subband_coef_index[i*vd + j]) {
> if(get_bits_count(&q->gb) < q->bits_per_subpacket){
> - subband_coef_noise[i*vd+j] = get_bits1(&q->gb);
> + subband_coef_sign[i*vd+j] = get_bits1(&q->gb);
> } else {
> result=1;
> - subband_coef_noise[i*vd+j]=0;
> + subband_coef_sign[i*vd+j]=0;
> }
> } else {
> - subband_coef_noise[i*vd+j]=0;
> + subband_coef_sign[i*vd+j]=0;
> }
cosmetic change
[...]
>
>
> Patch 5.
> Fixed point decoder.
> A define in "cook.c" chooses between fixed and floating point
> implementations.
> With floating point, no efect on output.
> With fixed point, output difference is about 0.25rms, max +-1, at 16bit.
>
> cook.c | 279 +++++---------------------
> cook_fixp_mdct.h | 545
> ++++++++++++++++++++++++++++++++++++++++++++++++++++
> cook_fixpoint.h | 243 +++++++++++++++++++++++
> cook_float.h | 226 +++++++++++++++++++++
> cookdata.h | 66 ------
> cookdata_fixpoint.h | 433 +++++++++++++++++++++++++++++++++++++++++
> cookdata_float.h | 113 ++++++++++
> 7 files changed, 1614 insertions(+), 291 deletions(-)
[...]
> +static const FIXPU sincos_lookup[2050] = {
> + /* x_i = 2^16 sin(i 2pi/8192), 2^16 cos(i 2pi/8192); i=0..1024 */
i think such large tables should be generated instead of hardcoded
[...]
> diff -upN -x 'ff*' coef/cookdata_float.h fixpoint/cookdata_float.h
> --- coef/cookdata_float.h 1970-01-01 01:00:00.000000000 +0100
> +++ fixpoint/cookdata_float.h 2007-03-08 13:28:17.000000000 +0100
this must be diffed against cookdata.h or whereever the tables came from
[not reviewed code]
[...]
> diff -upN -x 'ff*' coef/cook_fixp_mdct.h fixpoint/cook_fixp_mdct.h
> --- coef/cook_fixp_mdct.h 1970-01-01 01:00:00.000000000 +0100
> +++ fixpoint/cook_fixp_mdct.h 2007-03-08 11:10:13.000000000 +0100
adding a generic fixpoint mdct to lavc MUST be a seperate patch&mail&commit
(this part also doesnt fall under benjamins maintainership but mine)
[...]
> +/**
> + * Fixed point multiply by power of two.
> + *
> + * @param x fix point value
> + * @param i integer power-of-two, -31..+31
> + */
> +static inline FIXP fixp_pow2(FIXP x, int i)
> +{
> + if (i < 0)
> + return (x >> -i) + ((x >> (-i-1)) & 1);
> + else
> + return x << i; /* no check for overflow */
> +}
for left shift id simply use << not fixp_pow2()
for right shift with rounding id use (x + (1<<(i-1)))>>i
the advantage here is that if i is known this simplifies to (x + C)>>C2
or is there any case where the direction of the shift is not known?
> +
> +/**
> + * Fixed point multiply by fraction.
> + *
> + * @param a fix point value
> + * @param b fix point fraction, 0 <= b < 1
> + */
> +static inline FIXP fixp_mult_su(FIXP a, FIXPU b)
> +{
> + int32_t hb = (a >> 16) * b;
> + uint32_t lb = (a & 0xffff) * b;
> +
> + return hb + (lb >> 16) + ((lb & 0x8000) >> 15);
return ((int64_t)a * b + 0x8000) >> 16;
(its the compiers job to optimize this and if it fails such code
should be in ASM under appropriate #ifdef, the above may be optimal
for ARM but certainly not for every CPU)
also its probably worth trying to do this without correct rounding ...
> +}
> +
> +
> +/**
> + * The real requantization of the mltcoefs
> + *
> + * @param q pointer to the COOKContext
> + * @param index index
> + * @param quant_index quantisation index for this band
> + * @param subband_coef_index array of indexes to quant_centroid_tab
> + * @param subband_coef_sign use random noise instead of predetermined value
> + * @param mlt_ptr pointer to the mlt coefficients
> + */
> +static void scalar_dequant_math(COOKContext *q, int index,
> + int quant_index, int* subband_coef_index,
> + int* subband_coef_sign, REAL_T *mlt_p)
> +{
> + /* Num. half bits to right shift */
> + const int s = 33 - quant_index + av_log2(q->samples_per_channel);
> + const FIXP *table = quant_tables[s & 1][index];
> + FIXP f;
> + int i;
> +
> + for(i=0 ; i<SUBBAND_SIZE ; i++) {
> + f = table[subband_coef_index[i]];
> + /* noise coding if subband_coef_index[i] == 0 */
> + if (((subband_coef_index[i] == 0) && cook_random(q)) ||
> + ((subband_coef_index[i] != 0) && subband_coef_sign[i]))
> + f = -f;
> + mlt_p[i] = (s >= 64) ? 0 : fixp_pow2(f, -(s/2));
> + }
if(s>=64){
memset(mlt_p, 0, ...)
return;
}
assuming this case is possible at all
[...]
> +/**
> + * the actual requantization of the timedomain samples
> + *
> + * @param q pointer to the COOKContext
> + * @param buffer pointer to the timedomain buffer
> + * @param gain_index index for the block multiplier
> + * @param gain_index_next index for the next block multiplier
> + */
> +static inline void
> +interpolate_math(COOKContext *q, FIXP* buffer,
> + int gain_index, int gain_index_next)
> +{
> + int i;
> + int gain_size_factor = q->samples_per_channel / 8;
> +
> + if(gain_index == gain_index_next){ //static gain
> + for(i = 0; i < gain_size_factor; i++) {
> + buffer[i] = fixp_pow2(buffer[i], gain_index);
> + }
> + } else { //smooth gain
> + int step = (gain_index_next - gain_index)
> + << (7 - av_log2(gain_size_factor));
> + int x = 0;
> +
> + for(i = 0; i < gain_size_factor; i++) {
> + buffer[i] = fixp_mult_su(buffer[i], pow128_tab[x]);
> + buffer[i] = fixp_pow2(buffer[i], gain_index+1);
> +
> + x += step;
> + gain_index += (x + 128) / 128 - 1;
> + x = (x + 128) % 128;
> + }
gain_index += x>>7;
x&= 127
also fixp_mult_su() already does a shift so the fixp_pow2 can maybe
be merged into that
[...]
> +/**
> + * Final converion from floating point values to
> + * signed, 16 bit sound samples. Round and clip.
> + *
> + * @param q pointer to the COOKContext
> + * @param out pointer to the output buffer
> + * @param chan 0: left or single channel, 1: right channel
> + */
> +static inline void output_math(COOKContext *q, int16_t *out, int chan)
> +{
> + int j;
> +
> + for (j = 0; j < q->samples_per_channel; j++) {
> + out[chan + q->nb_channels * j] =
> + av_clip(fixp_pow2(q->mono_mdct_output[j], -11), -32768, 32767);
for(j = 0; j < q->samples_per_channel; j++) {
int v= q->mono_mdct_output[j] >> 11;
if(v+32768U > 65535U) v= (v>>31) ^ 0x7FFF;
*out= v;
out += q->nb_channels;
}
> + }
> +}
> diff -upN -x 'ff*' coef/cook_float.h fixpoint/cook_float.h
> --- coef/cook_float.h 1970-01-01 01:00:00.000000000 +0100
> +++ fixpoint/cook_float.h 2007-03-08 10:36:24.000000000 +0100
this should have been diffed against cook.c, it cannot be reviewed otherwise
as the code is from cook.c
[not reviewed code]
> +
> +
> +/**
> + * Final converion from floating point values to
> + * signed, 16 bit sound samples. Round and clip.
> + *
> + * @param q pointer to the COOKContext
> + * @param out pointer to the output buffer
> + * @param chan 0: left or single channel, 1: right channel
> + */
> +static inline void output_math(COOKContext *q, int16_t *out, int chan)
> +{
> + int j;
> +
> + for (j = 0; j < q->samples_per_channel; j++) {
> + out[chan + q->nb_channels * j] =
> + av_clip(lrintf(q->mono_mdct_output[j]), -32768, 32767);
> + }
> +}
DSPContext.float_to_int16()
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Everything should be made as simple as possible, but not simpler.
-- Albert Einstein
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070308/da4e38b9/attachment.pgp>
More information about the ffmpeg-devel
mailing list