[PATCH] New rgb32tobgr32 (was: Re: [Ffmpeg-devel] [PATCH] have cs_test check for sigsegv at smaller widths and sigill)

Sat Apr 14 13:12:59 CEST 2007

Hi

On Sat, Apr 14, 2007 at 12:55:46PM +0200, Ivo wrote:
> On Saturday 14 April 2007 02:14, Michael Niedermayer wrote:
> > On Fri, Apr 13, 2007 at 10:40:12PM +0200, Ivo wrote:
> > > On Friday 13 April 2007 19:19, Ivo wrote:
> > > Okay, let's do one at the time. Here's a new rgb32tobgr32.
> > >
> > > Old C code:
> > > [..]
> > > Avg: 71106977
> > >
> > > New C code:
> > > [..]
> > > Avg: 67607306
> > >
> > > Old MMX code:
> > > [..]
> > > Avg: 68040665
> > >
> > > New MMX code:
> > > [..]
> > > Avg: 67486036
> > >
> > > My CPU is an AMD Sempron 2400+.
[...]
> > > [..]
> > > +    for (; s<end; s+=4, d+=4) {
> > > +        int v = *(uint32_t *)s;
> > > +        int r = v & 0xff, g = (v>>8) & 0xff, b = (v>>16) & 0xff;
> > > +        *(uint32_t *)d = b + (g<<8) + (r<<16);
> >
> > int v = *(uint32_t *)s;
> > int g = v&0xFF00;
> > v &= 0xFF00FF;
> > *(uint32_t *)d = (v>>16) + (v<<16) + g
> >
> > 2 shift less
> > 1 and less
> >
> > the same trick can be done with the mmx code to avoid one pand
> > also all the shifts and register-register movq can be replaced
> > by a pshufw on mmx2
> 
> How's the following patch?
> 
> New C Code:
[...]
> Avg: 68333921
> 
> New MMX Code:
[...]
> Avg: 66930615
> 
> New MMX2 Code:
[...]
> Avg: 66258605
[...]
> +	__asm __volatile(
> +		"	"PREFETCH" (%1)			\n"
> +		"	movq %3, %%mm7			\n"
> +		"	pxor %4, %%mm7			\n"
> +		"	movq %%mm7, %%mm6		\n"
> +		"	pxor %5, %%mm7			\n"
> +		"	jmp 2f				\n"
> +			ASMALIGN(4)
> +		"1:					\n"
> +		"	"PREFETCH" 32(%1)		\n"
> +		"	movq (%1), %%mm0		\n"
> +		"	movq 8(%1), %%mm1		\n"

is moving the prefetch after the memory reads faster?

> +		"	movq %%mm0, %%mm2		\n"
> +		"	movq %%mm1, %%mm4		\n"
> +		"	pand %%mm7, %%mm0		\n"
> +		"	pand %%mm6, %%mm2		\n"
> +		"	pand %%mm7, %%mm1		\n"
> +		"	pand %%mm6, %%mm4		\n"
> +# ifdef HAVE_MMX2
> +		"	pshufw $177, %%mm2, %%mm3	\n"
> +		"	pshufw $177, %%mm4, %%mm5	\n"
> +		"	por %%mm3, %%mm0		\n"
> +		"	por %%mm5, %%mm1		\n"

you can still avoid 2 movq here, that is
read X
pshufw X,Y
pand C0,X
pand C1,Y
por  X,Y
store Y

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070414/e572eaf8/attachment.pgp>