[FFmpeg-devel] [PATCH] SSE2 Xvid idct
Michael Niedermayer
michaelni
Sun Apr 13 23:39:43 CEST 2008
On Sun, Apr 13, 2008 at 05:25:26PM -0400, Alexander Strange wrote:
>
> On Apr 13, 2008, at 6:26 AM, Michael Niedermayer wrote:
>> On Sun, Apr 13, 2008 at 05:35:01AM -0400, Alexander Strange wrote:
>>>
>>> On Apr 12, 2008, at 8:15 AM, Michael Niedermayer wrote:
>> [...]
>>>>> "psubsw %%xmm6, %%xmm5 \n\t" \
>>>>> "movdqa "ROW0", %%xmm4 \n\t" \
>>>>> "movdqa "ROW4", %%xmm6 \n\t" \
>>>>> "movdqa %%xmm2, "spill" \n\t" \
>>>>> "movdqa %%xmm4, %%xmm2 \n\t" \
>>>>> "psubsw %%xmm6, %%xmm4 \n\t" \
>>>>> "paddsw %%xmm2, %%xmm6 \n\t" \
>>>>> "movdqa %%xmm6, %%xmm2 \n\t" \
>>>>> "psubsw %%xmm7, %%xmm6 \n\t" \
>>>>> "paddsw %%xmm2, %%xmm7 \n\t" \
>>>>> "movdqa %%xmm4, %%xmm2 \n\t" \
>>>>> "psubsw %%xmm5, %%xmm4 \n\t" \
>>>>> "paddsw %%xmm2, %%xmm5 \n\t" \
>>>>> "movdqa %%xmm5, %%xmm2 \n\t" \
>>>>> "psubsw %%xmm0, %%xmm5 \n\t" \
>>>>> "paddsw %%xmm2, %%xmm0 \n\t" \
>>>>> "movdqa %%xmm4, %%xmm2 \n\t" \
>>>>> "psubsw %%xmm3, %%xmm4 \n\t" \
>>>>> "paddsw %%xmm2, %%xmm3 \n\t" \
>>>>> "movdqa "spill", %%xmm2 \n\t" \
>>>>
>>>> #ifdef ARCH_X86_64
>>>> # define XMMS "%%xmm12"
>>>> #else
>>>> # define XMMS "%%xmm2"
>>>> #endif
>>>> s/%%xmm2/XMMS/
>>>>
>>>> #ifndef ARCH_X86_64
>>>> "movdqa %%xmm2, "spill" \n\t" \
>>>> #endif
>>>> ...
>>>> #ifndef ARCH_X86_64
>>>> "movdqa "spill", %%xmm2 \n\t" \
>>>> #endif
>>>>
>>>> or a
>>>> MOV_ONLY_ON32" %%xmm2, ...
>>>>
>>>>
>>>> And i think something similar can be don with ROW*
>>>
>>> Done. The row part is already optimal on 64 since pshufhw handles it.
>>
>> I meant the
>>> "movdqa "ROW2", %%xmm4 \n\t" \
>>> "movdqa "ROW6", %%xmm6 \n\t" \
>> [...]
>>> "movdqa "ROW0", %%xmm4 \n\t" \
>>> "movdqa "ROW4", %%xmm6 \n\t" \
>>
>> they are unneeded on 64.
>
> Oh, that. Done:
[...]
> ///IDCT pass on columns, assuming rows 4-6 are zero.
^
typo
[...]
> iLLM_HEAD
> ASMALIGN(4)
> JNZ("%%ecx", "2f")
> JNZ("%%eax", "3f")
> JNZ("%%edx", "4f")
> JNZ("%%ebx", "5f")
> iLLM_PASS_SPARSE("%0")
> "jmp 6f \n\t"
> "2: \n\t"
> iMTX_MULT("4*16(%0)", MANGLE(iTab1), "#", PUT_EVEN(ROW4))
> "3: \n\t"
> iMTX_MULT("5*16(%0)", MANGLE(iTab4), ROUND(walkenIdctRounders+4*16), PUT_ODD(ROW5))
> JZ("%%edx", "1f")
> "4: \n\t"
> iMTX_MULT("6*16(%0)", MANGLE(iTab3), ROUND(walkenIdctRounders+5*16), PUT_EVEN(ROW6))
> JZ("%%ebx", "1f")
> "5: \n\t"
> iMTX_MULT("7*16(%0)", MANGLE(iTab2), ROUND(walkenIdctRounders+5*16), PUT_ODD(ROW7))
> iLLM_HEAD
iLLM_HEAD is executed twice here
> iLLM_PASS("%0")
> "6: \n\t"
> : "+r"(block)
> :
> : "%eax", "%ecx", "%edx", "%ebx", "memory");
ebx + gcc + PIC -> problems
Also the changes to existing code are missing this time ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Thouse who are best at talking, realize last or never when they are wrong.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080413/7b45e4a8/attachment.pgp>
More information about the ffmpeg-devel
mailing list