[FFmpeg-devel] [PATCH] lavc/aacenc_utils: unroll abs_pow34_v loop
Clément Bœsch
u at pkh.me
Sat Mar 19 12:42:09 CET 2016
On Fri, Mar 18, 2016 at 10:12:14PM -0700, Ganesh Ajjanagadde wrote:
> It seems like in all usages, size is a multiple of 4. This is documented
> as an assert.
>
> Yields speedup in this function, and small speedup for aac encoding overall.
>
> Sample benchmark (Haswell, -march=native + GCC):
> old:
> [...]
> 1390 decicycles in abs_pow34_v, 127138 runs, 3934 skips63.1x
> 1385 decicycles in abs_pow34_v, 254191 runs, 7953 skips64.4x
> 1383 decicycles in abs_pow34_v, 508305 runs, 15983 skips65.3x
>
> new:
> [...]
> 1109 decicycles in abs_pow34_v, 127122 runs, 3950 skips61.2x
> 1107 decicycles in abs_pow34_v, 254177 runs, 7967 skips63.5x
> 1106 decicycles in abs_pow34_v, 508292 runs, 15996 skips65.3x
>
> old:
> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac 4.55s user 0.03s system 99% cpu 4.581 total
> new:
> ffmpeg -f lavfi -i anoisesrc -t 300 -y sin_new.aac 4.50s user 0.04s system 99% cpu 4.537 total
>
> Signed-off-by: Ganesh Ajjanagadde <gajjanag at gmail.com>
> ---
> libavcodec/aacenc_utils.h | 24 +++++++++++++++---------
> 1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/libavcodec/aacenc_utils.h b/libavcodec/aacenc_utils.h
> index 0203b6e..800b78f 100644
> --- a/libavcodec/aacenc_utils.h
> +++ b/libavcodec/aacenc_utils.h
> @@ -37,20 +37,26 @@
> #define ROUND_TO_ZERO 0.1054f
> #define C_QUANT 0.4054f
>
> -static inline void abs_pow34_v(float *av_restrict out, const float *av_restrict in, const int size)
> -{
> - int i;
> - for (i = 0; i < size; i++) {
> - float a = fabsf(in[i]);
> - out[i] = sqrtf(a * sqrtf(a));
> - }
> -}
> -
> static inline float pos_pow34(float a)
> {
> return sqrtf(a * sqrtf(a));
> }
>
> +static inline void abs_pow34_v(float *av_restrict out, const float *av_restrict in, const int size)
> +{
> + av_assert2(!(size % 4));
> + for (int i = 0; i < size; i+=4) {
> + float a0 = fabsf(in[i]);
> + float a1 = fabsf(in[i+1]);
> + float a2 = fabsf(in[i+2]);
> + float a3 = fabsf(in[i+3]);
> + out[i ] = pos_pow34(a0);
> + out[i+1] = pos_pow34(a1);
> + out[i+2] = pos_pow34(a2);
> + out[i+3] = pos_pow34(a3);
> + }
> +}
> +
I'm curious (and lazy), is GCC able to unroll by itself if you hint it
with a loop such as:
int i;
for (i = 0; i < size & ~3; i++) {
float a = fabsf(in[i]);
out[i] = sqrtf(a * sqrtf(a));
}
--
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160319/6d0788ea/attachment.sig>
More information about the ffmpeg-devel
mailing list