[Ffmpeg-devel] clever 8-bit MMX loop filter ABS test
Skal
skal
Wed May 4 08:53:44 CEST 2005
Hi Michael,
On Tue, 2005-05-03 at 13:46, Michael Niedermayer wrote:
> Hi
>
> On Tuesday 03 May 2005 11:03, Skal wrote:
> [...]
> > %macro ABS_LESS_SSE 2 ; %1:out reg %2: alpha-1/beta-1 mm0:Px mm1:Qx
> > Trashes mm0,mm1,mm2
> > movq mm2, mm0 ; Save Po
> > psubusb mm0, mm1 ; Po-Qo
> > psubusb mm1, mm2 ; Qo-Po
> > psubusb mm0, %2
> > psubusb mm1, %2
> > por mm1, mm0
> > pxor %1, %1
> > pcmpeqb %1, mm1
>
> movq mm2, mm0 ; Save Po
> psubusb mm0, %1 ; Po-Qo
> psubusb %1, mm2 ; Qo-Po
> por %1, mm0
> psubusb %1, %2
> pcmpeqb %1, mm7
> is 2 instructions less and should be faster
Not necessarily, because of non-pairability.
But once the macro-ized code is exploded and
overlapped, your code will indeed be better
since it uses less regs, and mm0 is preserved,
allowing load-instr removal at a global level.
-Skal
More information about the ffmpeg-devel
mailing list