[FFmpeg-devel] [PATCH] H.264: x264 SSE2 iDCT functions
Michael Niedermayer
michaelni
Sat Jan 3 00:57:50 CET 2009
On Fri, Jan 02, 2009 at 03:37:11PM -0500, Jason Garrett-Glaser wrote:
> > a random idea: (untested and ignore if slower)
> >
> > movd "block[ 0]", %%mm0 // 0 0 X D
> > punpcklwd "block[16]", %%mm0 // x X d D
> > paddsw "32", %%mm0
> > psraw $6, %%mm0
> > punpcklwd %%mm0, %%mm0 // d d D D
> > pxor %%mm1, %%mm1 // 0 0 0 0
> > psubw %%mm0, %%mm1 // -d-d-D-D
> > packuswb %%mm1, %%mm0 // -d-d-D-D d d D D
> > pshufw $0xFA, %%mm0, %%mm1 // -d-d-d-d-D-D-D-D
> > punpcklwd %%mm0, %%mm0 // d d d d D D D D
> >
> >
> > except that, patch ok
>
> 1.5 clocks faster in i16x16 idct... barely worth it, but still better,
> so I'll keep it.
>
> Patch attached.
looks good
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090103/410a64d5/attachment.pgp>
More information about the ffmpeg-devel
mailing list