[FFmpeg-devel] [PATCH] vp9: 10/12bpp SIMD (sse2/ssse3/avx) for directional intra prediction.
Henrik Gramner
henrik at gramner.com
Thu Oct 1 23:29:40 CEST 2015
On Wed, Sep 30, 2015 at 9:36 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
diff --git a/libavcodec/x86/vp9intrapred_16bpp.asm
b/libavcodec/x86/vp9intrapred_16bpp.asm
+pd_65535: times 8 dd 0xffff
Duplicate of pd_0f from 264_qpel_10bit.asm
+%if cpuflag(ssse3)
+ ; FIXME this can be done without three-op-instr by doing pshfhw m1, m0
+ ; but then interleaving decreases, measure which is faster
+ pshufb m1, m0, [pb_2to15_14_15]; bcdefghh
+%else
+ psrldq m1, m0, 2 ; bcdefgh.
+%endif
+ pshufhw m0, m0, q3310 ; abcdefhh
+%if notcpuflag(ssse3)
+ pshufhw m1, m1, q2210 ; bcdefghh
+%endif
Move pshufhw into the else part. There's also a typo (pshfhw) in the comment.
+%if cpuflag(ssse3)
+ pshufb m0, m4
+%else
+ psrldq m0, 2 ; CDEFGHh.
+%endif
+ pshuflw m1, m1, q3321 ; GHhhhhhh
+%if notcpuflag(ssse3)
+ pshufhw m0, m0, q2210 ; CDEFGHhh
+%endif
Ditto
+%if cpuflag(ssse3)
+ pshufb m1, m3
+ pshufb m2, m3
+%else
+ psrldq m1, 2
+ psrldq m2, 2
+ pshufhw m1, m1, q2210
+ pshufhw m2, m2, q2210
+%endif
+ mova [dstq+strideq*2], m1
+ mova [dstq+stride3q ], m2
+ lea dstq, [dstq+strideq*4]
+%if cpuflag(ssse3)
+ pshufb m1, m3
+ pshufb m2, m3
+%else
+ psrldq m1, 2
+ psrldq m2, 2
+ pshufhw m1, m1, q2210
+ pshufhw m2, m2, q2210
+%endif
+ mova [dstq+strideq*0], m1
+ mova [dstq+strideq*1], m2
+%if cpuflag(ssse3)
+ pshufb m1, m3
+ pshufb m2, m3
+%else
+ psrldq m1, 2
+ psrldq m2, 2
+ pshufhw m1, m1, q2210
+ pshufhw m2, m2, q2210
+%endif
+ mova [dstq+strideq*2], m1
+ mova [dstq+stride3q ], m2
Possibly some deduplication here. There are a few very similar
segments in more places as well, might be possible to turn them into a
macro.
+%if cpuflag(ssse3)
+ pshufb m2, [pb_4_5_8to13_8x0]
+%else
+ pshuflw m2, m2, q2222
+%endif
+ psrldq m0, 6
+%if notcpuflag(ssse3)
+ psrldq m2, 6
+%endif
Move psrldq into the else part.
It's quite a large patch so I mostly just skimmed through it fairly
quickly, but the rest looks fine to me.
More information about the ffmpeg-devel
mailing list