[PATCH] New rgb32tobgr32 (was: Re: [Ffmpeg-devel] [PATCH] have cs_test check for sigsegv at smaller widths and sigill)
Ivo
ivop
Sat Apr 14 14:27:35 CEST 2007
Hi,
On Saturday 14 April 2007 13:12, Michael Niedermayer wrote:
> On Sat, Apr 14, 2007 at 12:55:46PM +0200, Ivo wrote:
> > On Saturday 14 April 2007 02:14, Michael Niedermayer wrote:
> > > On Fri, Apr 13, 2007 at 10:40:12PM +0200, Ivo wrote:
> > > > On Friday 13 April 2007 19:19, Ivo wrote:
> > > > Okay, let's do one at the time. Here's a new rgb32tobgr32.
> > > >
> > > > Old C code:
> > > > [..]
> > > > Avg: 71106977
> > > >
> > > > Old MMX code:
> > > > [..]
> > > > Avg: 68040665
> > New C Code:
> [...]
> > Avg: 68333921
> >
> > New MMX Code:
> [...]
> > Avg: 66930615
> >
> > New MMX2 Code:
> [...]
> > Avg: 66258605
>
[..]
> > + __asm __volatile(
> > + " "PREFETCH" (%1) \n"
> > + " movq %3, %%mm7 \n"
> > + " pxor %4, %%mm7 \n"
> > + " movq %%mm7, %%mm6 \n"
> > + " pxor %5, %%mm7 \n"
> > + " jmp 2f \n"
> > + ASMALIGN(4)
> > + "1: \n"
> > + " "PREFETCH" 32(%1) \n"
> > + " movq (%1), %%mm0 \n"
> > + " movq 8(%1), %%mm1 \n"
>
> is moving the prefetch after the memory reads faster?
I didn't notice any change, but perhaps somebody with an older CPU could
test it?
> > + " movq %%mm0, %%mm2 \n"
> > + " movq %%mm1, %%mm4 \n"
> > + " pand %%mm7, %%mm0 \n"
> > + " pand %%mm6, %%mm2 \n"
> > + " pand %%mm7, %%mm1 \n"
> > + " pand %%mm6, %%mm4 \n"
> > +# ifdef HAVE_MMX2
> > + " pshufw $177, %%mm2, %%mm3 \n"
> > + " pshufw $177, %%mm4, %%mm5 \n"
> > + " por %%mm3, %%mm0 \n"
> > + " por %%mm5, %%mm1 \n"
>
> you can still avoid 2 movq here, that is
> read X
> pshufw X,Y
> pand C0,X
> pand C1,Y
> por X,Y
> store Y
Done.
--Ivo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rgb32tobgr32.new.patch
Type: text/x-diff
Size: 3039 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070414/6117af71/attachment.patch>
More information about the ffmpeg-devel
mailing list