[FFmpeg-devel] [PATCH] AAC decoder

Sun May 25 20:42:04 CEST 2008

On Sun, May 25, 2008 at 07:27:31PM +0100, Robert Swain wrote:
> 2008/5/25 Ivan Kalvachev <ikalvachev at gmail.com>:
> > On 5/25/08, Michael Niedermayer <michaelni at gmx.at> wrote:
> >> On Sun, May 25, 2008 at 02:55:07PM +0100, Robert Swain wrote:
> >>> 2008/5/24 Michael Niedermayer <michaelni at gmx.at>:
> >>> > On Sat, May 24, 2008 at 06:35:37PM +0100, Robert Swain wrote:
> >>> >> 2008/5/23 Michael Niedermayer <michaelni at gmx.at>:
> >>> >> > On Fri, May 23, 2008 at 01:59:41PM +0100, Robert Swain wrote:
> >>> >> >> Index: aac.c
> >>> >> >> ===================================================================
> >>> >> >> --- aac.c     (revision 2185)
> >>> >> >> +++ aac.c     (working copy)
> >>> >> >> @@ -366,7 +366,7 @@
> >>> >> >>      DECLARE_ALIGNED_16(float, sine_short_128[128]);
> >>> >> >>      DECLARE_ALIGNED_16(float, pow2sf_tab[256]);
> >>> >> >>      DECLARE_ALIGNED_16(float, intensity_tab[256]);
> >>> >> >> -    DECLARE_ALIGNED_16(float, ivquant_tab[256]);
> >>> >> >> +    DECLARE_ALIGNED_16(float, ivquant_tab[128]);
> >>> >> >>      MDCTContext mdct;
> >>> >> >>      MDCTContext mdct_small;
> >>> >> >>      MDCTContext *mdct_ltp;
> >>> >> >> @@ -890,8 +890,11 @@
> >>> >> >>      // BIAS method instead needs values -1<x<1
> >>> >> >>      for (i = 0; i < 256; i++)
> >>> >> >>          ac->intensity_tab[i] = pow(0.5, (i - 100) / 4.);
> >>> >> >> -    for (i = 0; i <
> >>> >> >> sizeof(ac->ivquant_tab)/sizeof(ac->ivquant_tab[0]); i++)
> >>> >> >> -        ac->ivquant_tab[i] = pow(i, 4./3);
> >>> >> >> +    for (i = 0; i <
> >>> >> >> sizeof(ac->ivquant_tab)/(sizeof(ac->ivquant_tab[0])<<1); i++) {
> >>> >> >> +        int idx = i<<1;
> >>> >> >> +        ac->ivquant_tab[idx]     =  pow(i, 4./3);
> >>> >> >> +        ac->ivquant_tab[idx + 1] = -ac->ivquant_tab[idx];
> >>> >> >> +    }
> >>> >> >>
> >>> >> >>      if(ac->dsp.float_to_int16 == ff_float_to_int16_c) {
> >>> >> >>          ac->add_bias = 385.0f;
> >>> >> >
> >>> >> >> @@ -1035,13 +1038,12 @@
> >>> >> >>  }
> >>> >> >>
> >>> >> >>  static inline float ivquant(AACContext * ac, int a) {
> >>> >> >
> >>> >> >> -    static const float sign[2] = { -1., 1. };
> >>> >> >>      int tmp = (a>>31);
> >>> >> >>      int abs_a = (a^tmp)-tmp;
> >>> >> >> -    if (abs_a < sizeof(ac->ivquant_tab)/sizeof(ac->ivquant_tab[0]))
> >>> >> >> -        return sign[tmp+1] * ac->ivquant_tab[abs_a];
> >>> >> >> +    if (abs_a <
> >>> >> >> sizeof(ac->ivquant_tab)/(sizeof(ac->ivquant_tab[0])<<1))
> >>> >> >> +        return ac->ivquant_tab[(abs_a<<1) + !!tmp];
> >>> >> >
> >>> >> > ehh... this should be:
> >>> >> >
> >>> >> > if(a + 127U < 255U)
> >>> >> >    return ivquant_tab[a + 127U];
> >>> >> >
> >>> >> > (or other constants depending on what table size is best ...)
> >>> >> >
> >>> >> >
> >>> >> >>      else
> >>> >> >> -        return sign[tmp+1] * pow(abs_a, 4./3);
> >>> >> >> +        return (2 * tmp + 1) * pow(abs_a, 4./3);
> >>> >> >
> >>> >> > pow(fabs(a), 1./3) * a;
> >>> >>
> >>> >> With those suggestions it is much faster. The alternating sign
> >>> >> construction for the table wasn't my idea, but I won't name names. :)
> >>> >> Anyway, see attached. Benchmarks on the same FAAC encoded South Park
> >>> >> episode:
> >>> >>
> >>> >> old size 256
> >>> > [...]
> >>> >> 3956 dezicycles in ivquant, 2096816 runs, 336 skipsup=0 drop=0
> >>> >>
> >>> >> new size 8
> >>> > [...]
> >>> >> 4840 dezicycles in ivquant, 2066668 runs, 30484 skips=0 drop=0
> >>> >>
> >>> >> new size 16
> >>> > [...]
> >>> >> 3650 dezicycles in ivquant, 2093424 runs, 3728 skipsp=0 drop=0
> >>> >>
> >>> >> new size 32
> >>> > [...]
> >>> >> 3438 dezicycles in ivquant, 2096888 runs, 264 skipsup=0 drop=0
> >>> >>
> >>> >> new size 64
> >>> > [...]
> >>> >> 3447 dezicycles in ivquant, 2096915 runs, 237 skipsup=0 drop=0
> >>> >>
> >>> >> new size 128
> >>> > [...]
> >>> >> 3431 dezicycles in ivquant, 2096918 runs, 234 skipsup=0 drop=0
> >>> >>
> >>> >> new size 256
> >>> > [...]
> >>> >> 3431 dezicycles in ivquant, 2096953 runs, 199 skipsup=0 drop=0
> >>> >>
> >>> >> new size 512
> >>> > [...]
> >>> >> 3438 dezicycles in ivquant, 2097093 runs, 59 skipsdup=0 drop=0
> >>> >>
> >>> >> It looks to me like there's little difference in performance when the
> >>> >> table is of size 32 or larger. Should I use size 32?
> >>> >
> >>> > From the numbers i see, yes 32 seems the best choice.
> >>> >
> >>> > What bitrate did your test file have? High bitrate files might be faster
> >>> > with larger tables, so if it was low bitrate then it might be worth
> >>> > retrying
> >>> > with some higher bitrate.
> >>>
> >>> Same audio source but encoded to 320kbps with QuickTime.
> >>
> >>> I included
> >>> the full listings as some table sizes seem to behave strangely based
> >>> on the number of calls.
> >>
> >> Effects of the skipping of (some) pow() i assume ...
> >>
> >>
> >>>
> >>> size 32
> >> [...]
> >>> 16429 dezicycles in ivquant, 4169262 runs, 25042 skips drop=0
> >>>
> >>> size 64
> >> [...]
> >>> 11718 dezicycles in ivquant, 4147408 runs, 46896 skips drop=0
> >>>
> >>> size 128
> >> [...]
> >>> 7687 dezicycles in ivquant, 4148129 runs, 46175 skips0 drop=0
> >>>
> >>> size 256
> >> [...]
> >>> 5174 dezicycles in ivquant, 4166995 runs, 27309 skips0 drop=0
> >>>
> >>> size 512
> >> [...]
> >>> 3826 dezicycles in ivquant, 4183674 runs, 10630 skips0 drop=0
> >>>
> >>> size 1024
> >> [...]
> >>> 3250 dezicycles in ivquant, 4191225 runs, 3079 skips=0 drop=0
> >>>
> >>> size 2048
> >> [...]
> >>> 3109 dezicycles in ivquant, 4193283 runs, 1021 skips=0 drop=0
> >>
> >> From these numbers a table size of 1024 seems to be the lowest acceptable.
> >> I guess the 4kb space wont matter compared to the speed loss a small table
> >> would cause with such files.
> >>
> >>
> >>>
> >>> > [...]
> >>> >> +    for (i = 1; i < IVQUANT_SIZE/2; i++) {
> >>> >> +        ac->ivquant_tab[IVQUANT_SIZE/2 - 1 + i] =  pow(i, 4./3);
> >>> >> +        ac->ivquant_tab[IVQUANT_SIZE/2 - 1 - i] =
> >>> >> -ac->ivquant_tab[IVQUANT_SIZE/2 - 1 + i];
> >>> >> +    }
> >>> >
> >>> > cant that be simplified with pow(fabs(i), 1./3) * i as well?
> >
> > Isn't i^(1/3) actually  cube root? There is C99 math.h function cbrt()
> > that calculates it, it may be a little faster.
> 
> A good point.
> 
> > BTW, if results are floats, why function that operate on doubles are used.
> > (vs. fabsf, powf etc..).
> 
> Another good point.
> 
> 
> current:
> 3237 dezicycles in ivquant, 4191580 runs, 2724 skips=0 drop=0
> 
> cbrt:
> 3169 dezicycles in ivquant, 4193238 runs, 1066 skips=0 drop=0
> 
> float funcs without cbrt:
> 3119 dezicycles in ivquant, 4193791 runs, 513 skipsp=0 drop=0
> 
> float funcs with cbrtf:
> 3070 dezicycles in ivquant, 4194193 runs, 111 skipsp=0 drop=0
> 
> 
> I'm not sure if it's the best method of testing but I decoded the file
> to pcm_s16le using faad and the code using float funcs and cbrtf and
> compared them using tiny_psnr and:
> 
> stddev:  0.01 PSNR:136.17 bytes:232734720
> 
> Shall I commit with the alterations suggested (table size 1024,
> explicit casts to unsigned int where necessary) plus use of cbrtf and
> fabsf for these functions?

yes

> 
> Shall I also go through the other math calls that are using the double
> precision functions and change them to the float functions?

where it makes a difference for speed, yes. The init code can keep using
doubles ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Concerning the gods, I have no means of knowing whether they exist or not
or of what sort they may be, because of the obscurity of the subject, and
the brevity of human life -- Protagoras
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080525/c2ecb979/attachment.pgp>