[FFmpeg-devel] [PATCH] lavc/aacenc_utils: unroll quantize_bands loop
Hendrik Leppkes
h.leppkes at gmail.com
Sat Mar 19 10:36:22 CET 2016
On Sat, Mar 19, 2016 at 3:27 AM, Ganesh Ajjanagadde <gajjanag at gmail.com> wrote:
> Yields speedup in quantize_bands, and non-negligible speedup in aac encoding overall.
>
> Sample benchmark (Haswell, -march=native + GCC):
> new:
> [...]
> 553 decicycles in quantize_bands, 2097136 runs, 16 skips9x
> 554 decicycles in quantize_bands, 4194266 runs, 38 skips8x
> 559 decicycles in quantize_bands, 8388534 runs, 74 skips7x
>
> old:
> [...]
> 711 decicycles in quantize_bands, 2097140 runs, 12 skips7x
> 713 decicycles in quantize_bands, 4194277 runs, 27 skips4x
> 715 decicycles in quantize_bands, 8388538 runs, 70 skips3x
>
> old:
> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac 4.58s user 0.01s system 99% cpu 4.590 total
>
> new:
> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac 4.54s user 0.02s system 99% cpu 4.566 total
>
> Signed-off-by: Ganesh Ajjanagadde <gajjanag at gmail.com>
> ---
> libavcodec/aacenc_utils.h | 33 +++++++++++++++++++++++++--------
> 1 file changed, 25 insertions(+), 8 deletions(-)
>
> diff --git a/libavcodec/aacenc_utils.h b/libavcodec/aacenc_utils.h
> index 38636e5..0203b6e 100644
> --- a/libavcodec/aacenc_utils.h
> +++ b/libavcodec/aacenc_utils.h
> @@ -62,18 +62,35 @@ static inline int quant(float coef, const float Q, const float rounding)
> return sqrtf(a * sqrtf(a)) + rounding;
> }
>
> +static inline float minf(float x, float y) {
> + return x < y ? x : y;
> +}
> +
Thats exactly what the FFMIN macro expands to, whats the reason for
introducing this function?
> static inline void quantize_bands(int *out, const float *in, const float *scaled,
> int size, float Q34, int is_signed, int maxval,
> const float rounding)
> {
> - int i;
> - for (i = 0; i < size; i++) {
> - float qc = scaled[i] * Q34;
> - int tmp = (int)FFMIN(qc + rounding, (float)maxval);
> - if (is_signed && in[i] < 0.0f) {
> - tmp = -tmp;
> - }
> - out[i] = tmp;
> + for (int i = 0; i < size; i+=4) {
> + float qc0 = scaled[i ] * Q34;
> + float qc1 = scaled[i+1] * Q34;
> + float qc2 = scaled[i+2] * Q34;
> + float qc3 = scaled[i+3] * Q34;
> + int tmp0 = minf(qc0 + rounding, maxval);
> + int tmp1 = minf(qc1 + rounding, maxval);
> + int tmp2 = minf(qc2 + rounding, maxval);
> + int tmp3 = minf(qc3 + rounding, maxval);
> + if (is_signed && in[i ] < 0.0f)
> + tmp0 = -tmp0;
> + if (is_signed && in[i+1] < 0.0f)
> + tmp1 = -tmp1;
> + if (is_signed && in[i+2] < 0.0f)
> + tmp2 = -tmp2;
> + if (is_signed && in[i+3] < 0.0f)
> + tmp3 = -tmp3;
> + out[i ] = tmp0;
> + out[i+1] = tmp1;
> + out[i+2] = tmp2;
> + out[i+3] = tmp3;
> }
> }
>
Is size always a multiple of 4?
- Hendrik
More information about the ffmpeg-devel
mailing list