[PATCH] New rgb32tobgr32 (was: Re: [Ffmpeg-devel] [PATCH] have cs_test check for sigsegv at smaller widths and sigill)

Sat Apr 14 14:27:35 CEST 2007

Hi,

On Saturday 14 April 2007 13:12, Michael Niedermayer wrote:
> On Sat, Apr 14, 2007 at 12:55:46PM +0200, Ivo wrote:
> > On Saturday 14 April 2007 02:14, Michael Niedermayer wrote:
> > > On Fri, Apr 13, 2007 at 10:40:12PM +0200, Ivo wrote:
> > > > On Friday 13 April 2007 19:19, Ivo wrote:
> > > > Okay, let's do one at the time. Here's a new rgb32tobgr32.
> > > >
> > > > Old C code:
> > > > [..]
> > > > Avg: 71106977
> > > >
> > > > Old MMX code:
> > > > [..]
> > > > Avg: 68040665
> > New C Code:
> [...]
> > Avg: 68333921
> >
> > New MMX Code:
> [...]
> > Avg: 66930615
> >
> > New MMX2 Code:
> [...]
> > Avg: 66258605
>
[..]
> > +	__asm __volatile(
> > +		"	"PREFETCH" (%1)			\n"
> > +		"	movq %3, %%mm7			\n"
> > +		"	pxor %4, %%mm7			\n"
> > +		"	movq %%mm7, %%mm6		\n"
> > +		"	pxor %5, %%mm7			\n"
> > +		"	jmp 2f				\n"
> > +			ASMALIGN(4)
> > +		"1:					\n"
> > +		"	"PREFETCH" 32(%1)		\n"
> > +		"	movq (%1), %%mm0		\n"
> > +		"	movq 8(%1), %%mm1		\n"
>
> is moving the prefetch after the memory reads faster?

I didn't notice any change, but perhaps somebody with an older CPU could 
test it?

> > +		"	movq %%mm0, %%mm2		\n"
> > +		"	movq %%mm1, %%mm4		\n"
> > +		"	pand %%mm7, %%mm0		\n"
> > +		"	pand %%mm6, %%mm2		\n"
> > +		"	pand %%mm7, %%mm1		\n"
> > +		"	pand %%mm6, %%mm4		\n"
> > +# ifdef HAVE_MMX2
> > +		"	pshufw $177, %%mm2, %%mm3	\n"
> > +		"	pshufw $177, %%mm4, %%mm5	\n"
> > +		"	por %%mm3, %%mm0		\n"
> > +		"	por %%mm5, %%mm1		\n"
>
> you can still avoid 2 movq here, that is
> read X
> pshufw X,Y
> pand C0,X
> pand C1,Y
> por  X,Y
> store Y

Done.

--Ivo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rgb32tobgr32.new.patch
Type: text/x-diff
Size: 3039 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070414/6117af71/attachment.patch>