[PATCH] New rgb32tobgr32 (was: Re: [Ffmpeg-devel] [PATCH] have cs_test check for sigsegv at smaller widths and sigill)
Michael Niedermayer
michaelni
Sat Apr 14 13:12:59 CEST 2007
Hi
On Sat, Apr 14, 2007 at 12:55:46PM +0200, Ivo wrote:
> On Saturday 14 April 2007 02:14, Michael Niedermayer wrote:
> > On Fri, Apr 13, 2007 at 10:40:12PM +0200, Ivo wrote:
> > > On Friday 13 April 2007 19:19, Ivo wrote:
> > > Okay, let's do one at the time. Here's a new rgb32tobgr32.
> > >
> > > Old C code:
> > > [..]
> > > Avg: 71106977
> > >
> > > New C code:
> > > [..]
> > > Avg: 67607306
> > >
> > > Old MMX code:
> > > [..]
> > > Avg: 68040665
> > >
> > > New MMX code:
> > > [..]
> > > Avg: 67486036
> > >
> > > My CPU is an AMD Sempron 2400+.
[...]
> > > [..]
> > > + for (; s<end; s+=4, d+=4) {
> > > + int v = *(uint32_t *)s;
> > > + int r = v & 0xff, g = (v>>8) & 0xff, b = (v>>16) & 0xff;
> > > + *(uint32_t *)d = b + (g<<8) + (r<<16);
> >
> > int v = *(uint32_t *)s;
> > int g = v&0xFF00;
> > v &= 0xFF00FF;
> > *(uint32_t *)d = (v>>16) + (v<<16) + g
> >
> > 2 shift less
> > 1 and less
> >
> > the same trick can be done with the mmx code to avoid one pand
> > also all the shifts and register-register movq can be replaced
> > by a pshufw on mmx2
>
> How's the following patch?
>
> New C Code:
[...]
> Avg: 68333921
>
> New MMX Code:
[...]
> Avg: 66930615
>
> New MMX2 Code:
[...]
> Avg: 66258605
[...]
> + __asm __volatile(
> + " "PREFETCH" (%1) \n"
> + " movq %3, %%mm7 \n"
> + " pxor %4, %%mm7 \n"
> + " movq %%mm7, %%mm6 \n"
> + " pxor %5, %%mm7 \n"
> + " jmp 2f \n"
> + ASMALIGN(4)
> + "1: \n"
> + " "PREFETCH" 32(%1) \n"
> + " movq (%1), %%mm0 \n"
> + " movq 8(%1), %%mm1 \n"
is moving the prefetch after the memory reads faster?
> + " movq %%mm0, %%mm2 \n"
> + " movq %%mm1, %%mm4 \n"
> + " pand %%mm7, %%mm0 \n"
> + " pand %%mm6, %%mm2 \n"
> + " pand %%mm7, %%mm1 \n"
> + " pand %%mm6, %%mm4 \n"
> +# ifdef HAVE_MMX2
> + " pshufw $177, %%mm2, %%mm3 \n"
> + " pshufw $177, %%mm4, %%mm5 \n"
> + " por %%mm3, %%mm0 \n"
> + " por %%mm5, %%mm1 \n"
you can still avoid 2 movq here, that is
read X
pshufw X,Y
pand C0,X
pand C1,Y
por X,Y
store Y
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
When you are offended at any man's fault, turn to yourself and study your
own failings. Then you will forget your anger. -- Epictetus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070414/e572eaf8/attachment.pgp>
More information about the ffmpeg-devel
mailing list