[FFmpeg-devel] [PATCH 04/10] x86: synth filter float: implement SSE2 version
Michael Niedermayer
michaelni at gmx.at
Fri Feb 28 20:54:00 CET 2014
On Fri, Feb 14, 2014 at 04:00:48PM +0000, Christophe Gisquet wrote:
> Timings for Arrandale:
> C SSE
> win32: 2108 334
> win64: 1152 322
>
> Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
> the jmp destination being aligned.
>
> Unrolling for ARCH_X86_64 is a 20 cycles gain.
applied
thanks
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
It is dangerous to be right in matters on which the established authorities
are wrong. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140228/b525254d/attachment.asc>
More information about the ffmpeg-devel
mailing list