[FFmpeg-devel] [RFC] snow SSE2 optimizations (was: Re: [FFmpeg-cvslog] r10223 - in trunk/libavcodec/i386: dsputil_mmx.c snowdsp_mmx.c)

Thu Aug 30 15:05:25 CEST 2007

On Thu, 30 Aug 2007, Reimar D?ffinger wrote:
> On Tue, Aug 28, 2007 at 12:07:02AM +0200, Reimar D?ffinger wrote:
>>         "packuswb %%xmm4, %%xmm0                 \n\t"
>>         "movq   %%xmm0, (%%"REG_d")              \n\t"
>>         "movhpd %%xmm0, (%%"REG_d",%%"REG_c")    \n\t"
>
> As I understand the documentation this instruction does nothing float
> specific. But that would mean that movhps does exactly the same - but it
> has a different opcode (one byte smaller!).
> Can someone explain that to me? I guess it makes more sense to just use
> movhps? Or should I avoid these completely and use a second packuswb
> like the old code did? Or something completely different?

Yes, movhpd and mpvhps are identical, but with different opcodes. Some 
cpus might theoretically optimize them differently. e.g. if you 
store and load data at the same address but of different sizes, 
that interferes with store-load forwarding. I don't specifically know of 
any cpus that care whether your 64bits of data is 2 floats vs 1 double vs 
8 chars, but it's possible.
Another example of instructions with the same effect but different uses is 
movdqu and lddqu.

--Loren Merritt