[FFmpeg-devel] [PATCH] aacenc_utils: unroll loops to allow compiler to use SIMD.

Sun Mar 6 21:22:55 CET 2016

On Sun, Mar 06, 2016 at 04:46:08PM -0300, James Almer wrote:
> On 3/6/2016 4:14 PM, Reimar Döffinger wrote:
> > On Sun, Mar 06, 2016 at 03:49:00PM -0300, James Almer wrote:
> >> On 3/6/2016 3:35 PM, Reimar Döffinger wrote:
> >> Are you sure this wasn't vectorized already? I remember i checked and it mostly
> >> was, at least on gcc 5.3 mingw-w64 with default settings.
> > 
> > Then it would hardly get 10% faster, would it (though
> > I admit I didn't test the two parts separately)?
> > But I am fairly sure that before the patch it only
> > used sqrtss instructions and not sqrtps.
> 
> Without your patch, GCC 5.3 mingw-w64 x86_64 default settings.
> 
[...]
> 
> Didn't bench but it seems to help GCC vectorize more efficiently so this patch
> is probably ok, especially if in your case it made your compiler actually be
> able to vectorize at all.

Actually, I retract that patch.
It might cause a very minor speedup (maybe 1.5%) due to what you saw,
which is basically that gcc now also uses SIMD in the unaligned
input path.
However the big speedup comes from a different change
I by accident mixed into this one.