[FFmpeg-devel] [PATCH] SSE2 Xvid idct
Michael Niedermayer
michaelni
Tue Apr 15 00:47:50 CEST 2008
On Mon, Apr 14, 2008 at 06:12:05PM -0400, Alexander Strange wrote:
>
> On Apr 13, 2008, at 10:26 PM, Michael Niedermayer wrote:
>> On Sun, Apr 13, 2008 at 10:10:21PM -0400, Alexander Strange wrote:
>>> On Sun, Apr 13, 2008 at 5:39 PM, Michael Niedermayer <michaelni at gmx.at>
>>> wrote:
>>>> [..]
>>>>>>>>
>>>>>>>> #ifdef ARCH_X86_64
>>>>>>>> # define XMMS "%%xmm12"
>>>>>>>> #else
>>>>>>>> # define XMMS "%%xmm2"
>>>>>>>> #endif
>>>>>>>> s/%%xmm2/XMMS/
>>>>>>>>
>>>>>>>> #ifndef ARCH_X86_64
>>>>>>>> "movdqa %%xmm2, "spill" \n\t" \
>>>>>>>> #endif
>>>>>>>> ...
>>>>>>>> #ifndef ARCH_X86_64
>>>>>>>> "movdqa "spill", %%xmm2 \n\t" \
>>>>>>>> #endif
>>>>>>>>
>>>>>>>> or a
>>>>>>>> MOV_ONLY_ON32" %%xmm2, ...
>>>>>>>>
>>>>>>>>
>>>>>>>> And i think something similar can be don with ROW*
>>>>>>>
>>>>>>> Done. The row part is already optimal on 64 since pshufhw handles it.
>>>>>>
>>>>>> I meant the
>>>>>>> "movdqa "ROW2", %%xmm4 \n\t" \
>>>>>>> "movdqa "ROW6", %%xmm6 \n\t" \
>>>>>> [...]
>>>>>>> "movdqa "ROW0", %%xmm4 \n\t" \
>>>>>>> "movdqa "ROW4", %%xmm6 \n\t" \
>>>>>>
>>>>>> they are unneeded on 64.
>>>>>
>>>>> Oh, that. Done:
>>>>
>>>>
>>>> [...]
>>>>> ///IDCT pass on columns, assuming rows 4-6 are zero.
>>>> ^
>>>> typo
>>>
>>> Fixed.
>>>
>>>> [...]
>>>>> iLLM_HEAD
>>>>> ASMALIGN(4)
>>>>> JNZ("%%ecx", "2f")
>>>>> JNZ("%%eax", "3f")
>>>>> JNZ("%%edx", "4f")
>>>>> JNZ("%%ebx", "5f")
>>>>> iLLM_PASS_SPARSE("%0")
>>>>> "jmp 6f \n\t"
>>>>> "2: \n\t"
>>>>> iMTX_MULT("4*16(%0)", MANGLE(iTab1), "#", PUT_EVEN(ROW4))
>>>>> "3: \n\t"
>>>>> iMTX_MULT("5*16(%0)", MANGLE(iTab4), ROUND(walkenIdctRounders+4*16),
>>>>> PUT_ODD(ROW5))
>>>>> JZ("%%edx", "1f")
>>>>> "4: \n\t"
>>>>> iMTX_MULT("6*16(%0)", MANGLE(iTab3), ROUND(walkenIdctRounders+5*16),
>>>>> PUT_EVEN(ROW6))
>>>>> JZ("%%ebx", "1f")
>>>>> "5: \n\t"
>>>>> iMTX_MULT("7*16(%0)", MANGLE(iTab2), ROUND(walkenIdctRounders+5*16),
>>>>> PUT_ODD(ROW7))
>>>>> iLLM_HEAD
>>>>
>>>> iLLM_HEAD is executed twice here
>>>
>>> That's intentional, it turned out to be the best way to handle it on
>>> 32-bit. (call it a speculative prefetch)
>>> But we can get rid of it for x86-64, so I did.
>>>
>>>>> iLLM_PASS("%0")
>>>>> "6: \n\t"
>>>>> : "+r"(block)
>>>>> :
>>>>> : "%eax", "%ecx", "%edx", "%ebx", "memory");
>>>>
>>>> ebx + gcc + PIC -> problems
>>>>
>>>> Also the changes to existing code are missing this time ...
>>>
>>> changed to esi
>>> The others hadn't changed and I didn't want to repost them every time...
>>
>> looks ok
>
> Thanks. Here's all the patches again, could someone apply them?
send username & password to diego and apply them yourself :)
(of course only if you agree to our policy/coding/svn rules)
PS: you should do something about the MIME type of your attached files.
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Democracy is the form of government in which you can choose your dictator
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080415/5cb7ea84/attachment.pgp>
More information about the ffmpeg-devel
mailing list