[FFmpeg-devel] [PATCH] SSE2 Xvid idct
Alexander Strange
astrange
Sun Apr 6 22:41:53 CEST 2008
On Apr 6, 2008, at 4:03 PM, Pascal Massimino wrote:
> Re,
>
> On Sun, Apr 6, 2008 at 9:39 PM, Pascal Massimino <pascal.massimino at gmail.com
> >
> wrote:
>
>>
>>
>>>
>>> [...]
>>>> "movdqa %%xmm2, ("dct") \n\t" \
>>>> "movdqa %%xmm3, %%xmm2 \n\t" \
>>>> "psubsw %%xmm6, %%xmm3 \n\t" \
>>>> "paddsw %%xmm2, %%xmm6 \n\t" \
>>>> "movdqa %%xmm6, %%xmm2 \n\t" \
>>>> "psubsw %%xmm7, %%xmm6 \n\t" \
>>>> "paddsw %%xmm2, %%xmm7 \n\t" \
>>>> "movdqa %%xmm3, %%xmm2 \n\t" \
>>>> "psubsw %%xmm5, %%xmm3 \n\t" \
>>>> "paddsw %%xmm2, %%xmm5 \n\t" \
>>>> "movdqa %%xmm5, %%xmm2 \n\t" \
>>>> "psubsw %%xmm0, %%xmm5 \n\t" \
>>>> "paddsw %%xmm2, %%xmm0 \n\t" \
>>>> "movdqa %%xmm3, %%xmm2 \n\t" \
>>>> "psubsw %%xmm4, %%xmm3 \n\t" \
>>>> "paddsw %%xmm2, %%xmm4 \n\t" \
>>>> "movdqa ("dct"), %%xmm2 \n\t" \\
>>
>>
> oh! now i recall an optim: you don't need to
> save and recall xmm2 in "dct", provided you replace
> the first butterfly :
>
>> "movdqa %%xmm3, %%xmm2 \n\t" \
>> "psubsw %%xmm6, %%xmm3 \n\t" \
>> "paddsw %%xmm2, %%xmm6 \n\t" \
>
> by its (non-saturating) sub,add,add equivalent:
>
> psubw %%xmm6,%%xmm3
> paddw %%xmm6,%%xmm6
> paddw %%xmm3,%%xmm6
xmm2 is used as scratch for the other butterflies too, so it would
have to replace all of them. Also, that has more register dependencies
and might change the overflow behavior... I don't think it looks good,
but I'll try it. Right now it looks like reordering branches/replacing
shufd are the best things to look at first.
More information about the ffmpeg-devel
mailing list