[FFmpeg-devel] [PATCH] H.264: x264 SSE2 iDCT functions
Jason Garrett-Glaser
darkshikari
Sat Jan 3 01:46:26 CET 2009
On Fri, Jan 2, 2009 at 6:57 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Fri, Jan 02, 2009 at 03:37:11PM -0500, Jason Garrett-Glaser wrote:
>> > a random idea: (untested and ignore if slower)
>> >
>> > movd "block[ 0]", %%mm0 // 0 0 X D
>> > punpcklwd "block[16]", %%mm0 // x X d D
>> > paddsw "32", %%mm0
>> > psraw $6, %%mm0
>> > punpcklwd %%mm0, %%mm0 // d d D D
>> > pxor %%mm1, %%mm1 // 0 0 0 0
>> > psubw %%mm0, %%mm1 // -d-d-D-D
>> > packuswb %%mm1, %%mm0 // -d-d-D-D d d D D
>> > pshufw $0xFA, %%mm0, %%mm1 // -d-d-d-d-D-D-D-D
>> > punpcklwd %%mm0, %%mm0 // d d d d D D D D
>> >
>> >
>> > except that, patch ok
>>
>> 1.5 clocks faster in i16x16 idct... barely worth it, but still better,
>> so I'll keep it.
>>
>> Patch attached.
>
> looks good
applied.
Dark Shikari
More information about the ffmpeg-devel
mailing list