[FFmpeg-devel] [PATCH] VP8 luma(16) inner-MB H/V loopfilter MMX/SSE2
Eli Friedman
eli.friedman
Sun Jul 11 20:20:03 CEST 2010
On Sun, Jul 11, 2010 at 8:53 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
>
> as per $subj. All tested to be identical to C reference. If wanted, I
> can try to share parts of the filter code with the simple loopfilter,
> but I'm a little scared that it'll turn into massive spaghetti so I
> didn't do it yet.
+ mova m4, m1
+ SWAP 4, 1
This pattern seems to be repeated a lot... I fail to see the point.
Swapping two registers with the same contents doesn't do anything
significant.
For the following:
+ mova m4, [rsp+mmsize]
+ pxor m3, m3
+ psubusb m0, m4
+ psubusb m1, m4
+ psubusb m7, m4
+ psubusb m6, m4
+ pcmpeqb m0, m3 ; abs(p3-p2) <= I
+ pcmpeqb m1, m3 ; abs(p2-p1) <= I
+ pcmpeqb m7, m3 ; abs(q3-q2) <= I
+ pcmpeqb m6, m3 ; abs(q2-q1) <= I
+ pand m0, m1
+ pand m7, m6
+ pand m0, m7
The following should be faster with mmxext/sse2:
mova m4, [rsp+mmsize]
pxor m3, m3
pmaxub m0, m1
pmaxub m6, m7
pmaxub m0, m6
psubusb m0, m4
pcmpeqb m0, m3
+ mova m6, [rsp+mmsize*3]
+ pxor m7, m7
+ pand m0, m6
+ pand m1, m6
+ pavgb m0, m7 ; a
+ psubusb m1, [pb_1]
+ pavgb m1, m7 ; -a
+ psubusb m5, m0
+ paddusb m5, m1 ; q1-a
+ psubusb m2, m1
+ paddusb m2, m0 ; p1+a
pavgb is mmxext/sse2 only.
-Eli
More information about the ffmpeg-devel
mailing list