[FFmpeg-devel] [PATCH 2/2] mpegaudiodec: add SSE-optimized imdct36()
Loren Merritt
lorenm at u.washington.edu
Sat Aug 27 07:33:56 CEST 2011
> On Sun, Aug 21, 2011 at 04:53:19PM +0200, Vitor Sessak wrote:
> %macro BUTTERF 3
> movhlps %2, %1
> movlhps %2, %1
pshufd would reduce number of uops, although I haven't checked what it
would do to number of uops on the bottlenecked execution unit(s) or
latency.
> xorps %2, [ps_p1p1m1m1]
Can you xorps %1 instead to reduce dependency chain?
> addps %1, %2
> mulps %1, %3
> mova %2, %1
> shufps %1, %1, 0xb1
pshufd again
> xorps %2, [ps_p1m1p1m1]
> addps %1, %2
> %endmacro
> %macro SWAP_64BITS 2
> %ifdef ARCH_X86_64
> SWAP %1, %2
> %endif
> %endmacro
What good is this doing? There's no %else, so the code must also work
(with no extra instructions) if you don't swap...?
A bunch of mova (maybe all of them) could be eliminated in avx.
On Sat, 27 Aug 2011, Michael Niedermayer wrote:
> The main optimization i see is to interleave a few blocks so as to
> simplify the shuffling of data
Agreed.
--Loren Merritt
More information about the ffmpeg-devel
mailing list