[FFmpeg-devel] Mixed data type in SIMD code?
Zuxy Meng
zuxy.meng
Wed Mar 5 08:22:00 CET 2008
Hi,
2008/3/5, Zuxy Meng <zuxy.meng at gmail.com>:
> Hi,
>
> 2008/3/5, Loren Merritt <lorenm at u.washington.edu>:
> > On Tue, 4 Mar 2008, Michael Niedermayer wrote:
> > > On Mon, Mar 03, 2008 at 04:30:08PM -0700, Loren Merritt wrote:
> > >> On Mon, 3 Mar 2008, Michael Niedermayer wrote:
> > >>>
> > >>> Also i doubt we use or ever will use packed double.
> > >>
> > >> flac encoder does. Single isn't precise enough for a linear sum of up
> > >> to 16k elements. Reordering the sum to a tree made it half-way
> > >> decent decent precision, but also made it as slow as double.
> > >
> > > What about something like:
> > >
> > > for(i=0; i<16000;){
> > > float sum=0;
> > > do{
> > > sum+= whatever[i++];
> > > }while(i&127);
> > > double_sum += sum;
> > > }
> >
> > done.
> >
> > core2:
> > 2039632 dezicycles in autocorr_double_c, 65536 runs, 0 skips
> > 771026 dezicycles in autocorr_double_sse2, 65536 runs, 0 skips
> > 524713 dezicycles in autocorr_float_sse1, 65536 runs, 0 skips
> > 500609 dezicycles in autocorr_float_sse2, 65534 runs, 2 skips
> > 432458 dezicycles in autocorr_float_ssse3, 65535 runs, 1 skips
> > overall: 4.8%
> >
> > k8:
> > 1776170 dezicycles in autocorr_double_c, 65534 runs, 2 skips
> > 1062022 dezicycles in autocorr_double_sse2, 65535 runs, 1 skips
> > 932452 dezicycles in autocorr_float_sse1, 65533 runs, 3 skips
> > 911259 dezicycles in autocorr_float_sse2, 65534 runs, 2 skips
> > overall: 2.5%
> >
>
> It looks to me that
>
> + OP2(movhlps, 6,0, 7,1)\
> + OP2(addsd, 6,0, 7,1)\
> + "movsd %%xmm0, %2 \n\t"\
> + "movsd %%xmm1, 8+%2 \n\t"\
>
> can be optimized to
>
> haddpd %%xmm7, %%xmm6\n\t
> movapd %%xmm6, %2\n\t
>
> when SSE3 is available.
Benchmarking only this piece of code (6 inst. SSE vs 2 inst SSE3), on
a K8 SSE3 is merely about 1% faster but on a Prescott SSE3 is 85%
faster. Don't have access to any Core 2 though.
--
Zuxy
Beauty is truth,
While truth is beauty.
PGP KeyID: E8555ED6
More information about the ffmpeg-devel
mailing list