[FFmpeg-devel] [PATCH] vp9: 10/12bpp SIMD (sse2/ssse3/avx) for directional intra prediction.

Henrik Gramner henrik at gramner.com
Thu Oct 1 23:29:40 CEST 2015


On Wed, Sep 30, 2015 at 9:36 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
diff --git a/libavcodec/x86/vp9intrapred_16bpp.asm
b/libavcodec/x86/vp9intrapred_16bpp.asm

+pd_65535: times 8 dd 0xffff

Duplicate of pd_0f from 264_qpel_10bit.asm

+%if cpuflag(ssse3)
+    ; FIXME this can be done without three-op-instr by doing pshfhw m1, m0
+    ; but then interleaving decreases, measure which is faster
+    pshufb                  m1, m0, [pb_2to15_14_15]; bcdefghh
+%else
+    psrldq                  m1, m0, 2               ; bcdefgh.
+%endif
+    pshufhw                 m0, m0, q3310           ; abcdefhh
+%if notcpuflag(ssse3)
+    pshufhw                 m1, m1, q2210           ; bcdefghh
+%endif

Move pshufhw into the else part. There's also a typo (pshfhw) in the comment.

+%if cpuflag(ssse3)
+    pshufb                  m0, m4
+%else
+    psrldq                  m0, 2                   ; CDEFGHh.
+%endif
+    pshuflw                 m1, m1, q3321           ; GHhhhhhh
+%if notcpuflag(ssse3)
+    pshufhw                 m0, m0, q2210           ; CDEFGHhh
+%endif

Ditto

+%if cpuflag(ssse3)
+    pshufb                  m1, m3
+    pshufb                  m2, m3
+%else
+    psrldq                  m1, 2
+    psrldq                  m2, 2
+    pshufhw                 m1, m1, q2210
+    pshufhw                 m2, m2, q2210
+%endif
+    mova      [dstq+strideq*2], m1
+    mova      [dstq+stride3q ], m2
+    lea                   dstq, [dstq+strideq*4]
+%if cpuflag(ssse3)
+    pshufb                  m1, m3
+    pshufb                  m2, m3
+%else
+    psrldq                  m1, 2
+    psrldq                  m2, 2
+    pshufhw                 m1, m1, q2210
+    pshufhw                 m2, m2, q2210
+%endif
+    mova      [dstq+strideq*0], m1
+    mova      [dstq+strideq*1], m2
+%if cpuflag(ssse3)
+    pshufb                  m1, m3
+    pshufb                  m2, m3
+%else
+    psrldq                  m1, 2
+    psrldq                  m2, 2
+    pshufhw                 m1, m1, q2210
+    pshufhw                 m2, m2, q2210
+%endif
+    mova      [dstq+strideq*2], m1
+    mova      [dstq+stride3q ], m2

Possibly some deduplication here. There are a few very similar
segments in more places as well, might be possible to turn them into a
macro.

+%if cpuflag(ssse3)
+    pshufb                  m2, [pb_4_5_8to13_8x0]
+%else
+    pshuflw                 m2, m2, q2222
+%endif
+    psrldq                  m0, 6
+%if notcpuflag(ssse3)
+    psrldq                  m2, 6
+%endif

Move psrldq into the else part.

It's quite a large patch so I mostly just skimmed through it fairly
quickly, but the rest looks fine to me.


More information about the ffmpeg-devel mailing list