[PATCH] New rgb32tobgr32 (was: Re: [Ffmpeg-devel] [PATCH] have cs_test check for sigsegv at smaller widths and sigill)
Ivo
ivop
Sat Apr 14 12:55:46 CEST 2007
On Saturday 14 April 2007 02:14, Michael Niedermayer wrote:
> On Fri, Apr 13, 2007 at 10:40:12PM +0200, Ivo wrote:
> > On Friday 13 April 2007 19:19, Ivo wrote:
> > Okay, let's do one at the time. Here's a new rgb32tobgr32.
> >
> > Old C code:
> > [..]
> > Avg: 71106977
> >
> > New C code:
> > [..]
> > Avg: 67607306
> >
> > Old MMX code:
> > [..]
> > Avg: 68040665
> >
> > New MMX code:
> > [..]
> > Avg: 67486036
> >
> > My CPU is an AMD Sempron 2400+.
Which is a 32-bit Sempron BTW. Not many were made I believe.
> > + __asm __volatile(
> > + " "PREFETCH" (%1) \n"
> > + " movq %3, %%mm7 \n"
> > + " pxor %4, %%mm7 \n"
> > + " pxor %5, %%mm7 \n"
> >
> > + " movq %%mm7, %%mm6 \n"
>
> this is senseless, rather use the register for something usefull
> like avoiding reading %3 twice in the loop from memory
Originally it was meant to improve instruction pairing as I didn't see any
drop in performance by reading from memory, but I suppose that is more
noticable on lower-end CPU's. I changed the purpose of mm6 and currently
avoid all reads from memory in the loop.
> > [..]
> > + for (; s<end; s+=4, d+=4) {
> > + int v = *(uint32_t *)s;
> > + int r = v & 0xff, g = (v>>8) & 0xff, b = (v>>16) & 0xff;
> > + *(uint32_t *)d = b + (g<<8) + (r<<16);
>
> int v = *(uint32_t *)s;
> int g = v&0xFF00;
> v &= 0xFF00FF;
> *(uint32_t *)d = (v>>16) + (v<<16) + g
>
> 2 shift less
> 1 and less
>
> the same trick can be done with the mmx code to avoid one pand
> also all the shifts and register-register movq can be replaced
> by a pshufw on mmx2
How's the following patch?
New C Code:
69985150 dezicycles in rgb32tobgr32, 1 runs, 0 skips
70566460 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67979870 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67129280 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67166970 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67337970 dezicycles in rgb32tobgr32, 1 runs, 0 skips
70481800 dezicycles in rgb32tobgr32, 1 runs, 0 skips
66668770 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67293370 dezicycles in rgb32tobgr32, 1 runs, 0 skips
68729570 dezicycles in rgb32tobgr32, 1 runs, 0 skips
Avg: 68333921
New MMX Code:
66505730 dezicycles in rgb32tobgr32, 1 runs, 0 skips
66386220 dezicycles in rgb32tobgr32, 1 runs, 0 skips
64076890 dezicycles in rgb32tobgr32, 1 runs, 0 skips
64582190 dezicycles in rgb32tobgr32, 1 runs, 0 skips
68187940 dezicycles in rgb32tobgr32, 1 runs, 0 skips
65565120 dezicycles in rgb32tobgr32, 1 runs, 0 skips
75394570 dezicycles in rgb32tobgr32, 1 runs, 0 skips
65170580 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67334190 dezicycles in rgb32tobgr32, 1 runs, 0 skips
66102720 dezicycles in rgb32tobgr32, 1 runs, 0 skips
Avg: 66930615
New MMX2 Code:
66537630 dezicycles in rgb32tobgr32, 1 runs, 0 skips
66355890 dezicycles in rgb32tobgr32, 1 runs, 0 skips
64868640 dezicycles in rgb32tobgr32, 1 runs, 0 skips
66130640 dezicycles in rgb32tobgr32, 1 runs, 0 skips
66320290 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67119610 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67560890 dezicycles in rgb32tobgr32, 1 runs, 0 skips
67194460 dezicycles in rgb32tobgr32, 1 runs, 0 skips
64999600 dezicycles in rgb32tobgr32, 1 runs, 0 skips
65498400 dezicycles in rgb32tobgr32, 1 runs, 0 skips
Avg: 66258605
I indented the ifdef for MMX2 for readabilities sake.
--Ivo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rgb32tobgr32.new.patch
Type: text/x-diff
Size: 2927 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070414/e5ad22d7/attachment.patch>
More information about the ffmpeg-devel
mailing list