[FFmpeg-devel] [PATCH] H.264: x264 SSE2 iDCT functions

Michael Niedermayer michaelni
Sat Jan 3 00:57:50 CET 2009


On Fri, Jan 02, 2009 at 03:37:11PM -0500, Jason Garrett-Glaser wrote:
> > a random idea: (untested and ignore if slower)
> >
> > movd      "block[ 0]", %%mm0    //  0 0 X D
> > punpcklwd "block[16]", %%mm0    //  x X d D
> > paddsw           "32", %%mm0
> > psraw              $6, %%mm0
> > punpcklwd       %%mm0, %%mm0    //  d d D D
> > pxor            %%mm1, %%mm1    //  0 0 0 0
> > psubw           %%mm0, %%mm1    // -d-d-D-D
> > packuswb        %%mm1, %%mm0    // -d-d-D-D d d D D
> > pshufw   $0xFA, %%mm0, %%mm1    // -d-d-d-d-D-D-D-D
> > punpcklwd       %%mm0, %%mm0    //  d d d d D D D D
> >
> >
> > except that, patch ok
> 
> 1.5 clocks faster in i16x16 idct... barely worth it, but still better,
> so I'll keep it.
> 
> Patch attached.

looks good

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090103/410a64d5/attachment.pgp>



More information about the ffmpeg-devel mailing list