[FFmpeg-devel] [PATCH] AAC decoder

Robert Swain robert.swain
Sat May 24 19:35:37 CEST 2008


2008/5/23 Michael Niedermayer <michaelni at gmx.at>:
> On Fri, May 23, 2008 at 01:59:41PM +0100, Robert Swain wrote:
>> Index: aac.c
>> ===================================================================
>> --- aac.c     (revision 2185)
>> +++ aac.c     (working copy)
>> @@ -366,7 +366,7 @@
>>      DECLARE_ALIGNED_16(float, sine_short_128[128]);
>>      DECLARE_ALIGNED_16(float, pow2sf_tab[256]);
>>      DECLARE_ALIGNED_16(float, intensity_tab[256]);
>> -    DECLARE_ALIGNED_16(float, ivquant_tab[256]);
>> +    DECLARE_ALIGNED_16(float, ivquant_tab[128]);
>>      MDCTContext mdct;
>>      MDCTContext mdct_small;
>>      MDCTContext *mdct_ltp;
>> @@ -890,8 +890,11 @@
>>      // BIAS method instead needs values -1<x<1
>>      for (i = 0; i < 256; i++)
>>          ac->intensity_tab[i] = pow(0.5, (i - 100) / 4.);
>> -    for (i = 0; i < sizeof(ac->ivquant_tab)/sizeof(ac->ivquant_tab[0]); i++)
>> -        ac->ivquant_tab[i] = pow(i, 4./3);
>> +    for (i = 0; i < sizeof(ac->ivquant_tab)/(sizeof(ac->ivquant_tab[0])<<1); i++) {
>> +        int idx = i<<1;
>> +        ac->ivquant_tab[idx]     =  pow(i, 4./3);
>> +        ac->ivquant_tab[idx + 1] = -ac->ivquant_tab[idx];
>> +    }
>>
>>      if(ac->dsp.float_to_int16 == ff_float_to_int16_c) {
>>          ac->add_bias = 385.0f;
>
>> @@ -1035,13 +1038,12 @@
>>  }
>>
>>  static inline float ivquant(AACContext * ac, int a) {
>
>> -    static const float sign[2] = { -1., 1. };
>>      int tmp = (a>>31);
>>      int abs_a = (a^tmp)-tmp;
>> -    if (abs_a < sizeof(ac->ivquant_tab)/sizeof(ac->ivquant_tab[0]))
>> -        return sign[tmp+1] * ac->ivquant_tab[abs_a];
>> +    if (abs_a < sizeof(ac->ivquant_tab)/(sizeof(ac->ivquant_tab[0])<<1))
>> +        return ac->ivquant_tab[(abs_a<<1) + !!tmp];
>
> ehh... this should be:
>
> if(a + 127U < 255U)
>    return ivquant_tab[a + 127U];
>
> (or other constants depending on what table size is best ...)
>
>
>>      else
>> -        return sign[tmp+1] * pow(abs_a, 4./3);
>> +        return (2 * tmp + 1) * pow(abs_a, 4./3);
>
> pow(fabs(a), 1./3) * a;

With those suggestions it is much faster. The alternating sign
construction for the table wasn't my idea, but I won't name names. :)
Anyway, see attached. Benchmarks on the same FAAC encoded South Park
episode:

old size 256

8690 dezicycles in ivquant, 1 runs, 0 skips
21835 dezicycles in ivquant, 2 runs, 0 skips
12072 dezicycles in ivquant, 4 runs, 0 skips
7095 dezicycles in ivquant, 8 runs, 0 skips
4826 dezicycles in ivquant, 16 runs, 0 skips
4554 dezicycles in ivquant, 32 runs, 0 skips
3968 dezicycles in ivquant, 64 runs, 0 skips
3599 dezicycles in ivquant, 127 runs, 1 skips
3483 dezicycles in ivquant, 255 runs, 1 skips
3447 dezicycles in ivquant, 511 runs, 1 skips
3391 dezicycles in ivquant, 1023 runs, 1 skips
3183 dezicycles in ivquant, 2046 runs, 2 skips
2957 dezicycles in ivquant, 4094 runs, 2 skips
3030 dezicycles in ivquant, 8190 runs, 2 skips
3270 dezicycles in ivquant, 16381 runs, 3 skips
3503 dezicycles in ivquant, 32759 runs, 9 skips
3685 dezicycles in ivquant, 65521 runs, 15 skips
3803 dezicycles in ivquant, 131050 runs, 22 skips
3946 dezicycles in ivquant, 262109 runs, 35 skipsup=0 drop=0
3969 dezicycles in ivquant, 524225 runs, 63 skipsdup=0 drop=0
3947 dezicycles in ivquant, 1048457 runs, 119 skipsp=0 drop=0
3956 dezicycles in ivquant, 2096816 runs, 336 skipsup=0 drop=0

new size 8

42680 dezicycles in ivquant, 1 runs, 0 skips
22605 dezicycles in ivquant, 2 runs, 0 skips
12375 dezicycles in ivquant, 4 runs, 0 skips
7177 dezicycles in ivquant, 8 runs, 0 skips
4702 dezicycles in ivquant, 16 runs, 0 skips
4224 dezicycles in ivquant, 32 runs, 0 skips
3693 dezicycles in ivquant, 64 runs, 0 skips
3447 dezicycles in ivquant, 128 runs, 0 skips
3321 dezicycles in ivquant, 256 runs, 0 skips
3287 dezicycles in ivquant, 511 runs, 1 skips
3229 dezicycles in ivquant, 1023 runs, 1 skips
3208 dezicycles in ivquant, 2047 runs, 1 skips
3008 dezicycles in ivquant, 4094 runs, 2 skips
2833 dezicycles in ivquant, 8188 runs, 4 skips
2974 dezicycles in ivquant, 16356 runs, 28 skips
4107 dezicycles in ivquant, 32135 runs, 633 skips
4784 dezicycles in ivquant, 64099 runs, 1437 skips
4811 dezicycles in ivquant, 128628 runs, 2444 skips
4947 dezicycles in ivquant, 257736 runs, 4408 skips=0 drop=0
4945 dezicycles in ivquant, 515855 runs, 8433 skipsp=0 drop=0
4861 dezicycles in ivquant, 1033162 runs, 15414 skips0 drop=0
4840 dezicycles in ivquant, 2066668 runs, 30484 skips=0 drop=0

new size 16

8030 dezicycles in ivquant, 1 runs, 0 skips
18810 dezicycles in ivquant, 2 runs, 0 skips
10257 dezicycles in ivquant, 4 runs, 0 skips
5912 dezicycles in ivquant, 8 runs, 0 skips
3863 dezicycles in ivquant, 16 runs, 0 skips
3475 dezicycles in ivquant, 32 runs, 0 skips
2992 dezicycles in ivquant, 64 runs, 0 skips
2813 dezicycles in ivquant, 127 runs, 1 skips
2693 dezicycles in ivquant, 255 runs, 1 skips
2645 dezicycles in ivquant, 511 runs, 1 skips
2601 dezicycles in ivquant, 1022 runs, 2 skips
3306 dezicycles in ivquant, 2046 runs, 2 skips
2978 dezicycles in ivquant, 4094 runs, 2 skips
2711 dezicycles in ivquant, 8190 runs, 2 skips
2630 dezicycles in ivquant, 16380 runs, 4 skips
3165 dezicycles in ivquant, 32634 runs, 134 skips
3325 dezicycles in ivquant, 65225 runs, 311 skips
3470 dezicycles in ivquant, 130627 runs, 445 skips
3632 dezicycles in ivquant, 261451 runs, 693 skipsp=0 drop=0
3671 dezicycles in ivquant, 523011 runs, 1277 skipsp=0 drop=0
3642 dezicycles in ivquant, 1046525 runs, 2051 skips=0 drop=0
3650 dezicycles in ivquant, 2093424 runs, 3728 skipsp=0 drop=0

new size 32

6820 dezicycles in ivquant, 1 runs, 0 skips
4840 dezicycles in ivquant, 2 runs, 0 skips
3492 dezicycles in ivquant, 4 runs, 0 skips
2681 dezicycles in ivquant, 8 runs, 0 skips
2447 dezicycles in ivquant, 16 runs, 0 skips
3086 dezicycles in ivquant, 32 runs, 0 skips
2975 dezicycles in ivquant, 64 runs, 0 skips
2927 dezicycles in ivquant, 128 runs, 0 skips
2942 dezicycles in ivquant, 256 runs, 0 skips
2983 dezicycles in ivquant, 512 runs, 0 skips
2973 dezicycles in ivquant, 1024 runs, 0 skips
2967 dezicycles in ivquant, 2048 runs, 0 skips
2884 dezicycles in ivquant, 4095 runs, 1 skips
3072 dezicycles in ivquant, 8190 runs, 2 skips
2978 dezicycles in ivquant, 16382 runs, 2 skips
3119 dezicycles in ivquant, 32762 runs, 6 skips
3213 dezicycles in ivquant, 65522 runs, 14 skips
3340 dezicycles in ivquant, 131044 runs, 28 skips
3441 dezicycles in ivquant, 262100 runs, 44 skipsup=0 drop=0
3446 dezicycles in ivquant, 524217 runs, 71 skipsdup=0 drop=0
3430 dezicycles in ivquant, 1048438 runs, 138 skipsp=0 drop=0
3438 dezicycles in ivquant, 2096888 runs, 264 skipsup=0 drop=0

new size 64

7150 dezicycles in ivquant, 1 runs, 0 skips
21230 dezicycles in ivquant, 2 runs, 0 skips
11660 dezicycles in ivquant, 4 runs, 0 skips
6792 dezicycles in ivquant, 8 runs, 0 skips
4503 dezicycles in ivquant, 16 runs, 0 skips
4128 dezicycles in ivquant, 32 runs, 0 skips
3507 dezicycles in ivquant, 64 runs, 0 skips
3201 dezicycles in ivquant, 127 runs, 1 skips
3081 dezicycles in ivquant, 255 runs, 1 skips
3049 dezicycles in ivquant, 511 runs, 1 skips
3004 dezicycles in ivquant, 1023 runs, 1 skips
2982 dezicycles in ivquant, 2047 runs, 1 skips
2896 dezicycles in ivquant, 4095 runs, 1 skips
2776 dezicycles in ivquant, 8189 runs, 3 skips
2930 dezicycles in ivquant, 16380 runs, 4 skips
3082 dezicycles in ivquant, 32764 runs, 4 skips
3174 dezicycles in ivquant, 65523 runs, 13 skips
3287 dezicycles in ivquant, 131055 runs, 17 skips
3412 dezicycles in ivquant, 262118 runs, 26 skipsup=0 drop=0
3429 dezicycles in ivquant, 524229 runs, 59 skipsdup=0 drop=0
3430 dezicycles in ivquant, 1048453 runs, 123 skipsp=0 drop=0
3447 dezicycles in ivquant, 2096915 runs, 237 skipsup=0 drop=0

new size 128

12430 dezicycles in ivquant, 1 runs, 0 skips
30525 dezicycles in ivquant, 2 runs, 0 skips
16665 dezicycles in ivquant, 4 runs, 0 skips
9116 dezicycles in ivquant, 8 runs, 0 skips
5568 dezicycles in ivquant, 16 runs, 0 skips
4327 dezicycles in ivquant, 32 runs, 0 skips
3351 dezicycles in ivquant, 64 runs, 0 skips
2871 dezicycles in ivquant, 128 runs, 0 skips
2655 dezicycles in ivquant, 256 runs, 0 skips
2566 dezicycles in ivquant, 512 runs, 0 skips
2903 dezicycles in ivquant, 1024 runs, 0 skips
3673 dezicycles in ivquant, 2048 runs, 0 skips
3348 dezicycles in ivquant, 4096 runs, 0 skips
2897 dezicycles in ivquant, 8192 runs, 0 skips
2722 dezicycles in ivquant, 16383 runs, 1 skips
2979 dezicycles in ivquant, 32766 runs, 2 skips
3262 dezicycles in ivquant, 65527 runs, 9 skips
3331 dezicycles in ivquant, 131061 runs, 11 skips
3435 dezicycles in ivquant, 262126 runs, 18 skipsup=0 drop=0
3441 dezicycles in ivquant, 524236 runs, 52 skipsdup=0 drop=0
3423 dezicycles in ivquant, 1048459 runs, 117 skipsp=0 drop=0
3431 dezicycles in ivquant, 2096918 runs, 234 skipsup=0 drop=0

new size 256

14520 dezicycles in ivquant, 1 runs, 0 skips
25245 dezicycles in ivquant, 2 runs, 0 skips
13860 dezicycles in ivquant, 4 runs, 0 skips
7892 dezicycles in ivquant, 8 runs, 0 skips
5108 dezicycles in ivquant, 16 runs, 0 skips
4437 dezicycles in ivquant, 32 runs, 0 skips
3683 dezicycles in ivquant, 64 runs, 0 skips
3300 dezicycles in ivquant, 128 runs, 0 skips
3132 dezicycles in ivquant, 256 runs, 0 skips
3082 dezicycles in ivquant, 512 runs, 0 skips
3032 dezicycles in ivquant, 1023 runs, 1 skips
2952 dezicycles in ivquant, 2047 runs, 1 skips
2744 dezicycles in ivquant, 4095 runs, 1 skips
2666 dezicycles in ivquant, 8190 runs, 2 skips
2610 dezicycles in ivquant, 16382 runs, 2 skips
2937 dezicycles in ivquant, 32766 runs, 2 skips
3106 dezicycles in ivquant, 65533 runs, 3 skips
3257 dezicycles in ivquant, 131067 runs, 5 skips
3409 dezicycles in ivquant, 262125 runs, 19 skipsup=0 drop=0
3432 dezicycles in ivquant, 524235 runs, 53 skipsdup=0 drop=0
3423 dezicycles in ivquant, 1048471 runs, 105 skipsp=0 drop=0
3431 dezicycles in ivquant, 2096953 runs, 199 skipsup=0 drop=0

new size 512

10010 dezicycles in ivquant, 1 runs, 0 skips
6435 dezicycles in ivquant, 2 runs, 0 skips
4235 dezicycles in ivquant, 4 runs, 0 skips
3066 dezicycles in ivquant, 8 runs, 0 skips
2688 dezicycles in ivquant, 16 runs, 0 skips
3217 dezicycles in ivquant, 32 runs, 0 skips
3045 dezicycles in ivquant, 64 runs, 0 skips
2963 dezicycles in ivquant, 128 runs, 0 skips
2830 dezicycles in ivquant, 256 runs, 0 skips
2775 dezicycles in ivquant, 512 runs, 0 skips
2730 dezicycles in ivquant, 1024 runs, 0 skips
2649 dezicycles in ivquant, 2046 runs, 2 skips
2539 dezicycles in ivquant, 4094 runs, 2 skips
2614 dezicycles in ivquant, 8190 runs, 2 skips
2669 dezicycles in ivquant, 16382 runs, 2 skips
2955 dezicycles in ivquant, 32764 runs, 4 skips
3116 dezicycles in ivquant, 65530 runs, 6 skips
3263 dezicycles in ivquant, 131065 runs, 7 skips
3405 dezicycles in ivquant, 262135 runs, 9 skipsdup=0 drop=0
3445 dezicycles in ivquant, 524269 runs, 19 skipsdup=0 drop=0
3429 dezicycles in ivquant, 1048544 runs, 32 skipsup=0 drop=0
3438 dezicycles in ivquant, 2097093 runs, 59 skipsdup=0 drop=0

It looks to me like there's little difference in performance when the
table is of size 32 or larger. Should I use size 32?

Rob
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 20080524-1517-merge_sign_into_ivquant.diff
Type: text/x-diff
Size: 1729 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080524/ab42279e/attachment.diff>



More information about the ffmpeg-devel mailing list