[FFmpeg-devel] [RFC] snow SSE2 optimizations (was: Re: [FFmpeg-cvslog] r10223 - in trunk/libavcodec/i386: dsputil_mmx.c snowdsp_mmx.c)
Loren Merritt
lorenm
Thu Aug 30 15:05:25 CEST 2007
On Thu, 30 Aug 2007, Reimar D?ffinger wrote:
> On Tue, Aug 28, 2007 at 12:07:02AM +0200, Reimar D?ffinger wrote:
>> "packuswb %%xmm4, %%xmm0 \n\t"
>> "movq %%xmm0, (%%"REG_d") \n\t"
>> "movhpd %%xmm0, (%%"REG_d",%%"REG_c") \n\t"
>
> As I understand the documentation this instruction does nothing float
> specific. But that would mean that movhps does exactly the same - but it
> has a different opcode (one byte smaller!).
> Can someone explain that to me? I guess it makes more sense to just use
> movhps? Or should I avoid these completely and use a second packuswb
> like the old code did? Or something completely different?
Yes, movhpd and mpvhps are identical, but with different opcodes. Some
cpus might theoretically optimize them differently. e.g. if you
store and load data at the same address but of different sizes,
that interferes with store-load forwarding. I don't specifically know of
any cpus that care whether your 64bits of data is 2 floats vs 1 double vs
8 chars, but it's possible.
Another example of instructions with the same effect but different uses is
movdqu and lddqu.
--Loren Merritt
More information about the ffmpeg-devel
mailing list