[FFmpeg-devel] [PATCH] SSE2 Xvid idct

Tue Apr 15 00:12:05 CEST 2008

On Apr 13, 2008, at 10:26 PM, Michael Niedermayer wrote:
> On Sun, Apr 13, 2008 at 10:10:21PM -0400, Alexander Strange wrote:
>> On Sun, Apr 13, 2008 at 5:39 PM, Michael Niedermayer <michaelni at gmx.at 
>> > wrote:
>>> [..]
>>>>>>>
>>>>>>> #ifdef ARCH_X86_64
>>>>>>> # define XMMS   "%%xmm12"
>>>>>>> #else
>>>>>>> # define XMMS   "%%xmm2"
>>>>>>> #endif
>>>>>>> s/%%xmm2/XMMS/
>>>>>>>
>>>>>>> #ifndef ARCH_X86_64
>>>>>>> "movdqa   %%xmm2, "spill"         \n\t" \
>>>>>>> #endif
>>>>>>> ...
>>>>>>> #ifndef ARCH_X86_64
>>>>>>> "movdqa  "spill", %%xmm2          \n\t" \
>>>>>>> #endif
>>>>>>>
>>>>>>> or a
>>>>>>> MOV_ONLY_ON32" %%xmm2, ...
>>>>>>>
>>>>>>>
>>>>>>> And i think something similar can be don with ROW*
>>>>>>
>>>>>> Done. The row part is already optimal on 64 since pshufhw  
>>>>>> handles it.
>>>>>
>>>>> I meant the
>>>>>>   "movdqa   "ROW2", %%xmm4          \n\t" \
>>>>>>   "movdqa   "ROW6", %%xmm6          \n\t" \
>>>>> [...]
>>>>>>   "movdqa   "ROW0", %%xmm4          \n\t" \
>>>>>>   "movdqa   "ROW4", %%xmm6          \n\t" \
>>>>>
>>>>> they are unneeded on 64.
>>>>
>>>> Oh, that. Done:
>>>
>>>
>>> [...]
>>>> ///IDCT pass on columns, assuming rows 4-6 are zero.
>>>                                           ^
>>> typo
>>
>> Fixed.
>>
>>> [...]
>>>>    iLLM_HEAD
>>>>    ASMALIGN(4)
>>>>    JNZ("%%ecx", "2f")
>>>>    JNZ("%%eax", "3f")
>>>>    JNZ("%%edx", "4f")
>>>>    JNZ("%%ebx", "5f")
>>>>    iLLM_PASS_SPARSE("%0")
>>>>    "jmp 6f                                                      \n 
>>>> \t"
>>>>    "2:                                                          \n 
>>>> \t"
>>>>    iMTX_MULT("4*16(%0)", MANGLE(iTab1), "#", PUT_EVEN(ROW4))
>>>>    "3:                                                          \n 
>>>> \t"
>>>>    iMTX_MULT("5*16(%0)", MANGLE(iTab4), ROUND(walkenIdctRounders 
>>>> +4*16), PUT_ODD(ROW5))
>>>>    JZ("%%edx", "1f")
>>>>    "4:                                                          \n 
>>>> \t"
>>>>    iMTX_MULT("6*16(%0)", MANGLE(iTab3), ROUND(walkenIdctRounders 
>>>> +5*16), PUT_EVEN(ROW6))
>>>>    JZ("%%ebx", "1f")
>>>>    "5:                                                          \n 
>>>> \t"
>>>>    iMTX_MULT("7*16(%0)", MANGLE(iTab2), ROUND(walkenIdctRounders 
>>>> +5*16), PUT_ODD(ROW7))
>>>>    iLLM_HEAD
>>>
>>> iLLM_HEAD is executed twice here
>>
>> That's intentional, it turned out to be the best way to handle it on
>> 32-bit. (call it a speculative prefetch)
>> But we can get rid of it for x86-64, so I did.
>>
>>>>    iLLM_PASS("%0")
>>>>    "6:                                                          \n 
>>>> \t"
>>>>    : "+r"(block)
>>>>    :
>>>>    : "%eax", "%ecx", "%edx", "%ebx", "memory");
>>>
>>> ebx + gcc + PIC -> problems
>>>
>>> Also the changes to existing code are missing this time ...
>>
>> changed to esi
>> The others hadn't changed and I didn't want to repost them every  
>> time...
>
> looks ok

Thanks. Here's all the patches again, could someone apply them?

I got 3-10% total decode time improvement vs. simple_idct_mmx on Core  
2; I could rerun them if someone wants, but only after I figure out  
how to disable speedstepping.
sse2-permute - new IDCT permutation for SSE2 IDCTs
libavcodec/i386/idct_sse2_xvid.c + sse2-xvid-idct.diff - new SSE2 IDCT  
for Xvid

I added a period at the end of "Originally from..." in the comments.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: sse2-permute.diff
Type: application/octet-stream
Size: 1341 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080414/5047384f/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: idct_sse2_xvid.c
Type: application/octet-stream
Size: 15377 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080414/5047384f/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sse2-xvid-idct.diff
Type: application/octet-stream
Size: 1827 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080414/5047384f/attachment-0002.obj>