[FFmpeg-devel] [PATCH] SSE2 Xvid idct
Pascal Massimino
pascal.massimino
Sun Apr 6 21:39:57 CEST 2008
Hi,
On Sun, Apr 6, 2008 at 6:14 PM, Michael Niedermayer <michaelni at gmx.at>
wrote:
>
> > skal agreed it could be under LGPL in the last thread.
>
yep
>
> [...]
> > #define SKIP_ROW_CHECK(src) \
> > "movq "src", %%mm0 \n\t" \
> > "por 8+"src", %%mm0 \n\t" \
> > "packssdw %%mm0, %%mm0 \n\t" \
> > "movd %%mm0, %%eax \n\t" \
> > "testl %%eax, %%eax \n\t" \
> > "jz 1f \n\t"
>
> You could try to check pairs of rows, this might be faster for some rows.
> Also the code should be interleaved not form such nasty dependancy chains
> you do have enogh mmx registers.
just a quick note: you can try doing the same with
some 'pmovmskb mmreg, eax' instructions.
However, this is a complex instruction and the speed gain
is not necessarily obvious.
>
> [...]
> > "movdqa %%xmm2, ("dct") \n\t" \
> > "movdqa %%xmm3, %%xmm2 \n\t" \
> > "psubsw %%xmm6, %%xmm3 \n\t" \
> > "paddsw %%xmm2, %%xmm6 \n\t" \
> > "movdqa %%xmm6, %%xmm2 \n\t" \
> > "psubsw %%xmm7, %%xmm6 \n\t" \
> > "paddsw %%xmm2, %%xmm7 \n\t" \
> > "movdqa %%xmm3, %%xmm2 \n\t" \
> > "psubsw %%xmm5, %%xmm3 \n\t" \
> > "paddsw %%xmm2, %%xmm5 \n\t" \
> > "movdqa %%xmm5, %%xmm2 \n\t" \
> > "psubsw %%xmm0, %%xmm5 \n\t" \
> > "paddsw %%xmm2, %%xmm0 \n\t" \
> > "movdqa %%xmm3, %%xmm2 \n\t" \
> > "psubsw %%xmm4, %%xmm3 \n\t" \
> > "paddsw %%xmm2, %%xmm4 \n\t" \
> > "movdqa ("dct"), %%xmm2 \n\t" \
>
> i suspect this can be written without the load/store by using
> add,add,sub buterflies (of course only if it is faster)
iirc, i tried that and it's the same ticks count using the add,add,sub
butterfly. Plus, i may be wrong, but i recall that the saturations used
with the 'regular' mov,add,sub butterfly helps for nasty corner cases of
overflow.
I'll try and save some cycles to review the rest asap
skal
More information about the ffmpeg-devel
mailing list