[FFmpeg-devel] [PATCH] lavc/aacenc_utils: unroll quantize_bands loop

Ganesh Ajjanagadde gajjanag at gmail.com
Sat Mar 19 03:27:46 CET 2016


Yields speedup in quantize_bands, and non-negligible speedup in aac encoding overall.

Sample benchmark (Haswell, -march=native + GCC):
new:
    [...]
    553 decicycles in quantize_bands, 2097136 runs,     16 skips9x
    554 decicycles in quantize_bands, 4194266 runs,     38 skips8x
    559 decicycles in quantize_bands, 8388534 runs,     74 skips7x

old:
    [...]
    711 decicycles in quantize_bands, 2097140 runs,     12 skips7x
    713 decicycles in quantize_bands, 4194277 runs,     27 skips4x
    715 decicycles in quantize_bands, 8388538 runs,     70 skips3x

old:
ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac  4.58s user 0.01s system 99% cpu 4.590 total

new:
ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac  4.54s user 0.02s system 99% cpu 4.566 total

Signed-off-by: Ganesh Ajjanagadde <gajjanag at gmail.com>
---
 libavcodec/aacenc_utils.h | 33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/libavcodec/aacenc_utils.h b/libavcodec/aacenc_utils.h
index 38636e5..0203b6e 100644
--- a/libavcodec/aacenc_utils.h
+++ b/libavcodec/aacenc_utils.h
@@ -62,18 +62,35 @@ static inline int quant(float coef, const float Q, const float rounding)
     return sqrtf(a * sqrtf(a)) + rounding;
 }
 
+static inline float minf(float x, float y) {
+    return x < y ? x : y;
+}
+
 static inline void quantize_bands(int *out, const float *in, const float *scaled,
                                   int size, float Q34, int is_signed, int maxval,
                                   const float rounding)
 {
-    int i;
-    for (i = 0; i < size; i++) {
-        float qc = scaled[i] * Q34;
-        int tmp = (int)FFMIN(qc + rounding, (float)maxval);
-        if (is_signed && in[i] < 0.0f) {
-            tmp = -tmp;
-        }
-        out[i] = tmp;
+    for (int i = 0; i < size; i+=4) {
+        float qc0 = scaled[i  ] * Q34;
+        float qc1 = scaled[i+1] * Q34;
+        float qc2 = scaled[i+2] * Q34;
+        float qc3 = scaled[i+3] * Q34;
+        int tmp0 = minf(qc0 + rounding, maxval);
+        int tmp1 = minf(qc1 + rounding, maxval);
+        int tmp2 = minf(qc2 + rounding, maxval);
+        int tmp3 = minf(qc3 + rounding, maxval);
+        if (is_signed && in[i  ] < 0.0f)
+            tmp0 = -tmp0;
+        if (is_signed && in[i+1] < 0.0f)
+            tmp1 = -tmp1;
+        if (is_signed && in[i+2] < 0.0f)
+            tmp2 = -tmp2;
+        if (is_signed && in[i+3] < 0.0f)
+            tmp3 = -tmp3;
+        out[i  ] = tmp0;
+        out[i+1] = tmp1;
+        out[i+2] = tmp2;
+        out[i+3] = tmp3;
     }
 }
 
-- 
2.7.3



More information about the ffmpeg-devel mailing list