[FFmpeg-devel] [PATCH] move h264 loopfilter strength code to yasm
Ronald S. Bultje
rsbultje
Fri Sep 24 23:10:35 CEST 2010
Hi,
On Fri, Sep 24, 2010 at 4:50 PM, Loren Merritt <lorenm at u.washington.edu> wrote:
> The only way to do it is to use "m" and put the entire address in the input
> constraint.
>
> ? ? ? ?"movd %0, %%mm1 \n"
> ? ? ? ?"por ?%1, %%mm1 \n"
> ? ? ? ?::"m"(nnz[b_idx]),
> ? ? ? ? ?"m"(nnz[b_idx+d_idx])
> ? ?);
Ah, that works.
after: 887 dezicycles in lf-strength, 4194133 runs, 171 skips
before: 964 dezicycles in lf-strength, 4194097 runs, 207 skips
See attached patch, to be applied after my original first patch -
reattached here for convenience - that went from 116 to 96 cycles.
This is pretty much the performance gain that yasm gave me also, I'm
only 2 cycles off now.
__asm__ volatile(
"movd (%0), %%mm0 \n"
"psubb (%1), %%mm0 \n" // ref[b] != ref[bn]
"movq (%2), %%mm1 \n"
"movq 8(%2), %%mm2 \n"
"psubw (%3), %%mm1 \n"
"psubw 8(%3), %%mm2 \n"
"packsswb %%mm2, %%mm1 \n"
"paddb %%mm6, %%mm1 \n"
"psubusb %%mm5, %%mm1 \n" // abs(mv[b] -
mv[bn]) >= limit
"packsswb %%mm1, %%mm1 \n"
"por %%mm1, %%mm0 \n"
::"r"(ref[0]+b_idx),
"r"(ref[0]+b_idx+d_idx),
"r"(mv[0]+b_idx),
"r"(mv[0]+b_idx+d_idx)
);
then leads to:
0x000000010041eb00 <h264_loop_filter_strength_mmx2+144>: lea
(%r10,%rbp,1),%rdx
0x000000010041eb04 <h264_loop_filter_strength_mmx2+148>: lea
(%r12,%r10,4),%rax
0x000000010041eb08 <h264_loop_filter_strength_mmx2+152>: movd (%rdx),%mm0
0x000000010041eb0b <h264_loop_filter_strength_mmx2+155>: psubb
(%rdx,%r13,1),%mm0
0x000000010041eb10 <h264_loop_filter_strength_mmx2+160>: movq (%rax),%mm1
0x000000010041eb13 <h264_loop_filter_strength_mmx2+163>: movq 0x8(%rax),%mm2
0x000000010041eb17 <h264_loop_filter_strength_mmx2+167>: psubw
(%rax,%r13,4),%mm1
0x000000010041eb1c <h264_loop_filter_strength_mmx2+172>: psubw
0x8(%rax,%r13,4),%mm2
0x000000010041eb22 <h264_loop_filter_strength_mmx2+178>: packsswb %mm2,%mm1
0x000000010041eb25 <h264_loop_filter_strength_mmx2+181>: paddb %mm6,%mm1
0x000000010041eb28 <h264_loop_filter_strength_mmx2+184>: psubusb %mm5,%mm1
0x000000010041eb2b <h264_loop_filter_strength_mmx2+187>: packsswb %mm1,%mm1
0x000000010041eb2e <h264_loop_filter_strength_mmx2+190>: por %mm1,%mm0
I'm still relatively unhappy about the leas all around (this might
have a negative performance impact on x86-32, will test that later,
have to go now). But it mostly works the way I want it to.
Michael can you review both patches?
Ronald
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-lfstrength-inline-asm-lessvars.patch
Type: application/octet-stream
Size: 4202 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100924/400fb1a7/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-lfstrength-inline-asm.patch
Type: application/octet-stream
Size: 2946 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100924/400fb1a7/attachment-0001.obj>
More information about the ffmpeg-devel
mailing list