[FFmpeg-devel] [PATCH] SSE2 Xvid idct
Alexander Strange
astrange
Tue Apr 15 00:12:05 CEST 2008
On Apr 13, 2008, at 10:26 PM, Michael Niedermayer wrote:
> On Sun, Apr 13, 2008 at 10:10:21PM -0400, Alexander Strange wrote:
>> On Sun, Apr 13, 2008 at 5:39 PM, Michael Niedermayer <michaelni at gmx.at
>> > wrote:
>>> [..]
>>>>>>>
>>>>>>> #ifdef ARCH_X86_64
>>>>>>> # define XMMS "%%xmm12"
>>>>>>> #else
>>>>>>> # define XMMS "%%xmm2"
>>>>>>> #endif
>>>>>>> s/%%xmm2/XMMS/
>>>>>>>
>>>>>>> #ifndef ARCH_X86_64
>>>>>>> "movdqa %%xmm2, "spill" \n\t" \
>>>>>>> #endif
>>>>>>> ...
>>>>>>> #ifndef ARCH_X86_64
>>>>>>> "movdqa "spill", %%xmm2 \n\t" \
>>>>>>> #endif
>>>>>>>
>>>>>>> or a
>>>>>>> MOV_ONLY_ON32" %%xmm2, ...
>>>>>>>
>>>>>>>
>>>>>>> And i think something similar can be don with ROW*
>>>>>>
>>>>>> Done. The row part is already optimal on 64 since pshufhw
>>>>>> handles it.
>>>>>
>>>>> I meant the
>>>>>> "movdqa "ROW2", %%xmm4 \n\t" \
>>>>>> "movdqa "ROW6", %%xmm6 \n\t" \
>>>>> [...]
>>>>>> "movdqa "ROW0", %%xmm4 \n\t" \
>>>>>> "movdqa "ROW4", %%xmm6 \n\t" \
>>>>>
>>>>> they are unneeded on 64.
>>>>
>>>> Oh, that. Done:
>>>
>>>
>>> [...]
>>>> ///IDCT pass on columns, assuming rows 4-6 are zero.
>>> ^
>>> typo
>>
>> Fixed.
>>
>>> [...]
>>>> iLLM_HEAD
>>>> ASMALIGN(4)
>>>> JNZ("%%ecx", "2f")
>>>> JNZ("%%eax", "3f")
>>>> JNZ("%%edx", "4f")
>>>> JNZ("%%ebx", "5f")
>>>> iLLM_PASS_SPARSE("%0")
>>>> "jmp 6f \n
>>>> \t"
>>>> "2: \n
>>>> \t"
>>>> iMTX_MULT("4*16(%0)", MANGLE(iTab1), "#", PUT_EVEN(ROW4))
>>>> "3: \n
>>>> \t"
>>>> iMTX_MULT("5*16(%0)", MANGLE(iTab4), ROUND(walkenIdctRounders
>>>> +4*16), PUT_ODD(ROW5))
>>>> JZ("%%edx", "1f")
>>>> "4: \n
>>>> \t"
>>>> iMTX_MULT("6*16(%0)", MANGLE(iTab3), ROUND(walkenIdctRounders
>>>> +5*16), PUT_EVEN(ROW6))
>>>> JZ("%%ebx", "1f")
>>>> "5: \n
>>>> \t"
>>>> iMTX_MULT("7*16(%0)", MANGLE(iTab2), ROUND(walkenIdctRounders
>>>> +5*16), PUT_ODD(ROW7))
>>>> iLLM_HEAD
>>>
>>> iLLM_HEAD is executed twice here
>>
>> That's intentional, it turned out to be the best way to handle it on
>> 32-bit. (call it a speculative prefetch)
>> But we can get rid of it for x86-64, so I did.
>>
>>>> iLLM_PASS("%0")
>>>> "6: \n
>>>> \t"
>>>> : "+r"(block)
>>>> :
>>>> : "%eax", "%ecx", "%edx", "%ebx", "memory");
>>>
>>> ebx + gcc + PIC -> problems
>>>
>>> Also the changes to existing code are missing this time ...
>>
>> changed to esi
>> The others hadn't changed and I didn't want to repost them every
>> time...
>
> looks ok
Thanks. Here's all the patches again, could someone apply them?
I got 3-10% total decode time improvement vs. simple_idct_mmx on Core
2; I could rerun them if someone wants, but only after I figure out
how to disable speedstepping.
sse2-permute - new IDCT permutation for SSE2 IDCTs
libavcodec/i386/idct_sse2_xvid.c + sse2-xvid-idct.diff - new SSE2 IDCT
for Xvid
I added a period at the end of "Originally from..." in the comments.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sse2-permute.diff
Type: application/octet-stream
Size: 1341 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080414/5047384f/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: idct_sse2_xvid.c
Type: application/octet-stream
Size: 15377 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080414/5047384f/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sse2-xvid-idct.diff
Type: application/octet-stream
Size: 1827 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080414/5047384f/attachment-0002.obj>
More information about the ffmpeg-devel
mailing list