[FFmpeg-devel] [PATCH] H.264: x264 SSE2 iDCT functions

Jason Garrett-Glaser darkshikari
Sat Jan 3 01:46:26 CET 2009


On Fri, Jan 2, 2009 at 6:57 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Fri, Jan 02, 2009 at 03:37:11PM -0500, Jason Garrett-Glaser wrote:
>> > a random idea: (untested and ignore if slower)
>> >
>> > movd      "block[ 0]", %%mm0    //  0 0 X D
>> > punpcklwd "block[16]", %%mm0    //  x X d D
>> > paddsw           "32", %%mm0
>> > psraw              $6, %%mm0
>> > punpcklwd       %%mm0, %%mm0    //  d d D D
>> > pxor            %%mm1, %%mm1    //  0 0 0 0
>> > psubw           %%mm0, %%mm1    // -d-d-D-D
>> > packuswb        %%mm1, %%mm0    // -d-d-D-D d d D D
>> > pshufw   $0xFA, %%mm0, %%mm1    // -d-d-d-d-D-D-D-D
>> > punpcklwd       %%mm0, %%mm0    //  d d d d D D D D
>> >
>> >
>> > except that, patch ok
>>
>> 1.5 clocks faster in i16x16 idct... barely worth it, but still better,
>> so I'll keep it.
>>
>> Patch attached.
>
> looks good

applied.

Dark Shikari




More information about the ffmpeg-devel mailing list