[FFmpeg-devel] [PATCH] H.264: x264 SSE2 iDCT functions
Jason Garrett-Glaser
darkshikari
Fri Jan 2 21:37:11 CET 2009
> a random idea: (untested and ignore if slower)
>
> movd "block[ 0]", %%mm0 // 0 0 X D
> punpcklwd "block[16]", %%mm0 // x X d D
> paddsw "32", %%mm0
> psraw $6, %%mm0
> punpcklwd %%mm0, %%mm0 // d d D D
> pxor %%mm1, %%mm1 // 0 0 0 0
> psubw %%mm0, %%mm1 // -d-d-D-D
> packuswb %%mm1, %%mm0 // -d-d-D-D d d D D
> pshufw $0xFA, %%mm0, %%mm1 // -d-d-d-d-D-D-D-D
> punpcklwd %%mm0, %%mm0 // d d d d D D D D
>
>
> except that, patch ok
1.5 clocks faster in i16x16 idct... barely worth it, but still better,
so I'll keep it.
Patch attached.
Dark Shikari
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x264_idct.diff
Type: text/x-diff
Size: 13381 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090102/fa825555/attachment.diff>
More information about the ffmpeg-devel
mailing list