[FFmpeg-devel] [PATCH] lavc/aacenc_utils: unroll quantize_bands loop

Tue Mar 22 18:33:17 CET 2016

On Sat, Mar 19, 2016 at 2:36 AM, Hendrik Leppkes <h.leppkes at gmail.com> wrote:
> On Sat, Mar 19, 2016 at 3:27 AM, Ganesh Ajjanagadde <gajjanag at gmail.com> wrote:
>> Yields speedup in quantize_bands, and non-negligible speedup in aac encoding overall.
>>
>> Sample benchmark (Haswell, -march=native + GCC):
>> new:
>>     [...]
>>     553 decicycles in quantize_bands, 2097136 runs,     16 skips9x
>>     554 decicycles in quantize_bands, 4194266 runs,     38 skips8x
>>     559 decicycles in quantize_bands, 8388534 runs,     74 skips7x
>>
>> old:
>>     [...]
>>     711 decicycles in quantize_bands, 2097140 runs,     12 skips7x
>>     713 decicycles in quantize_bands, 4194277 runs,     27 skips4x
>>     715 decicycles in quantize_bands, 8388538 runs,     70 skips3x
>>
>> old:
>> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac  4.58s user 0.01s system 99% cpu 4.590 total
>>
>> new:
>> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac  4.54s user 0.02s system 99% cpu 4.566 total
>>
>> Signed-off-by: Ganesh Ajjanagadde <gajjanag at gmail.com>
>> ---
>>  libavcodec/aacenc_utils.h | 33 +++++++++++++++++++++++++--------
>>  1 file changed, 25 insertions(+), 8 deletions(-)
>>
>> diff --git a/libavcodec/aacenc_utils.h b/libavcodec/aacenc_utils.h
>> index 38636e5..0203b6e 100644
>> --- a/libavcodec/aacenc_utils.h
>> +++ b/libavcodec/aacenc_utils.h
>> @@ -62,18 +62,35 @@ static inline int quant(float coef, const float Q, const float rounding)
>>      return sqrtf(a * sqrtf(a)) + rounding;
>>  }
>>
>> +static inline float minf(float x, float y) {
>> +    return x < y ? x : y;
>> +}
>> +
>
> Thats exactly what the FFMIN macro expands to, whats the reason for
> introducing this function?

There was some compilation difference, in particular this was faster.
No idea why, maybe some repeated evaluation of qc + rounding?

>
>>  static inline void quantize_bands(int *out, const float *in, const float *scaled,
>>                                    int size, float Q34, int is_signed, int maxval,
>>                                    const float rounding)
>>  {
>> -    int i;
>> -    for (i = 0; i < size; i++) {
>> -        float qc = scaled[i] * Q34;
>> -        int tmp = (int)FFMIN(qc + rounding, (float)maxval);
>> -        if (is_signed && in[i] < 0.0f) {
>> -            tmp = -tmp;
>> -        }
>> -        out[i] = tmp;
>> +    for (int i = 0; i < size; i+=4) {
>> +        float qc0 = scaled[i  ] * Q34;
>> +        float qc1 = scaled[i+1] * Q34;
>> +        float qc2 = scaled[i+2] * Q34;
>> +        float qc3 = scaled[i+3] * Q34;
>> +        int tmp0 = minf(qc0 + rounding, maxval);
>> +        int tmp1 = minf(qc1 + rounding, maxval);
>> +        int tmp2 = minf(qc2 + rounding, maxval);
>> +        int tmp3 = minf(qc3 + rounding, maxval);
>> +        if (is_signed && in[i  ] < 0.0f)
>> +            tmp0 = -tmp0;
>> +        if (is_signed && in[i+1] < 0.0f)
>> +            tmp1 = -tmp1;
>> +        if (is_signed && in[i+2] < 0.0f)
>> +            tmp2 = -tmp2;
>> +        if (is_signed && in[i+3] < 0.0f)
>> +            tmp3 = -tmp3;
>> +        out[i  ] = tmp0;
>> +        out[i+1] = tmp1;
>> +        out[i+2] = tmp2;
>> +        out[i+3] = tmp3;
>>      }
>>  }
>>
>
> Is size always a multiple of 4?

It is as far as I could see, usage via num_coeffs is derived from
swb_offset values, which are all multiples of 4.
In particular, I stuck in an assert and ran fate as well to make sure.
If it helps, I can add an av_assert2 for this assumption.

>
> - Hendrik
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel